[Soekris] sc1100 TSC bug and scx200_hrt
Jim Cromie
jim.cromie at gmail.com
Sun Oct 8 02:24:12 UTC 2006
David Zelinsky wrote:
> Jim Cromie <jim.cromie at gmail.com> writes:
>
>
>> hi Alexander, David, Ted,
>>
>> thanks for your help.
>> The fix is in to 18-mmX (probably temporarily, til its added to 19-rc1),
>> and has been forwarded to -stable for consideration/inclusion into 18.1
>>
>
> Thanks, Jim, for finding and fxing this bug. I installed the patch
> and it seems to work, as did the suggestion of passing mhz27=1 as
> argument to the scx200_hrt module.
>
>
actually, you found it (and reported it). I had mhz27=1 in
/etc/modprobe.d/, and forgot about it, so hadnt
tested the default in a while :-}
>> heres a brief explanation of things ( in case it helps you isolate any
>> more bugs ;-)
>>
>
> [-snip-]
>
>
>> I dont at present have any current NTP numbers to share here, though
>> David posted these, which I re-send.
>> If you get better numbers with scx200_hrt, please post, along with uptime.
>> Also, send same for pit, tsc. I suspect tsc's slowness might be
>> evident in the last 4 numbers below.
>>
>
> The posted numbers were with scx200_hrt as clocksource. Does TSC's
> slowness still affect it?
>
>
no - as long as the tsc is detected as 'running slowly', it cannot foul
timekeeping.
>> % ntpq -pcrv
>> remote refid st t when poll reach delay offset jitter
>> ==============================================================================
>> *portico.dedekin 198.6.255.249 2 u 64 128 377 1.369 -4.837 0.778
>> LOCAL(0) LOCAL(0) 13 l 25 64 377 0.000 0.000 0.004
>>
>
>
> This was with scx200_hrt as clocksource without the new patch, but
> with the module called with mhz27=1, which made it run at normal speed
> rather than 27 times too fast.
>
>
>
> With the TSC clocksource, the jitter is so bad it can't keep ntp sync
> at all:
>
> % ntpq -pcrv
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> portico.dedekin 198.6.255.249 2 u 64 256 377 47.200 -454738 304606.
> *LOCAL(0) LOCAL(0) 13 l 52 64 377 0.000 0.000 0.015
>
>
That makes sense - its consistent with what Ted Phelps saw, graphed, and
reported here (way back when).
With the inclusion of GTS, it became simple to fix (John Stultz did the
hard parts)
>
> With the PIT clocksource I got numbers like these:
>
> % ntpq -pcrv
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> *portico.dedekin 198.6.255.249 2 u 12 64 377 1.297 58.347 15.520
> LOCAL(0) LOCAL(0) 13 l 10 64 377 0.000 0.000 0.015
>
>
>
I suspect that PIT works better than this indicates - your poll is still
only 64 secs, a fully settled NTP would be polling
at 1024 sec. The PIT is expensive to read (several ISA bus cycles -
*very* slow compared to a single rdtsc instruction),
but I cant quite swallow that it could cause 15 ms of jitter. ICBW, and
it doesnt matter - scx200_hrt works now ;-)
> Currently, with the newly patched scx200_hrt module, I see this:
>
> % uptime
> 20:35:33 up 21:37, 1 user, load average: 0.03, 0.05, 0.01
>
> % cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> scx200_hrt
>
> % ntpq -pcrv
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> *portico.dedekin 198.6.255.249 2 u 251 1024 377 1.814 0.300 0.125
> LOCAL(0) LOCAL(0) 13 l 55 64 377 0.000 0.000 0.004
> assID=0 status=0664 leap_none, sync_ntp, 6 events, event_peer/strat_chg,
> version="ntpd 4.2.0a at 1:4.2.0a+stable-2-r Fri Aug 26 10:30:12 UTC 2005 (1)"?,
> processor="i586", system="Linux/2.6.18-soekris-patched-1", leap=00,
> stratum=3, precision=-18, rootdelay=36.268, rootdispersion=69.574,
> peer=58612, refid=192.168.0.10,
> reftime=c8ced0eb.a71a9fbe Wed, Oct 4 2006 20:31:39.652, poll=10,
> clock=0xc8ced1e6.8ef5fd47, state=4, offset=0.300, frequency=153.532,
> noise=0.190, jitter=0.125, stability=1.048
>
>
>
yay. thanks.
One of these months, I'll get around to graphing those variables over a
day of operation
for each clocksource. This would presumably make the characteristics
crystal clear,
but its not a huge priority for me atm.
BTW, the fix is in 2.6.19-rc1, and will probably get into 2.6.18.1 when
its released
(If Im paying attention when its opened)
More information about the Soekris-tech
mailing list