Marking clocksource 'tsc' as unstable, because the skew is too large.


Dave001

Recommended Posts

Hi, Not sure where to post this so please move if there is a better spot.

 

I've been trying to track down random lockups of my Unraid server for the last few months, system becomes unresponsive, no WebUI, Telnet, FTP or file access, for around several minutes, then everything goes back to normal, always has the same error in the logs:

 

Feb 24 03:45:57 Server kernel: timekeeping watchdog: Marking clocksource 'tsc' as unstable, because the skew is too large:

Feb 24 03:45:57 Server kernel: 'hpet' wd_now: 9c965a15 wd_last: 9c5bb7c6 mask: ffffffff

Feb 24 03:45:57 Server kernel: 'tsc' cs_now: 30da64fb569e cs_last: 2ff9f2884ef6 mask: ffffffffffffffff

Feb 24 03:45:57 Server kernel: Hangcheck: hangcheck value past margin!

Feb 24 03:45:57 Server kernel: Switched to clocksource hpet

 

I managed to track the problem down to having the Emby docker running, if I disable the docker, the server runs flawlessly. I setup Plex, and the system ran perfectly for a week, re-enabled the Emby docker, system crashed again within a few hours. I have nuked the Emby docker, and started again, have also tried two different repositories.

 

Running Unraid 6.1.8 Pro, no extra plugins, other then the Emby docker.

System is:

AMD Phenom II X4 955, ASUS M5A97 EVO, 4gb Ram, 120gb SSD cache (docker and appdata, on SSD), 11 HDD 22TB array, all disks formated in XFS.

 

Diagnostic info is attached.

server-diagnostics-20160224-0740.zip

Emby_Log.txt

Link to comment

Does you server have the correct time?

 

Yes, server time is correct. I just changed the NTP servers to see if that helps.

 

One suggestion, check for a newer BIOS.  If none, you may need a new motherboard.

 

Timers all look good at boot, but apparently aren't up to the task under load.  That would seem like either defective timer chipset, or buggy timer routines.

 

Hadn't even considered the BIOS being a problem, there was a newer BIOS available, so I've flashed it and will see how it goes.

 

Thanks for the replies.

Link to comment

Replaced mainboard, still getting lockups.  >:(

 

Decided to swap the CPU from another system, just because it was about the only bit i hadn't replaced so far, problem seems to be gone, run flawlessly overnight.

Probably should have tried that before ordering a new mainboard hey.  :)

 

Thanks again for the replies.

 

 

Link to comment

Replaced mainboard, still getting lockups.  >:(

 

Decided to swap the CPU from another system, just because it was about the only bit i hadn't replaced so far, problem seems to be gone, run flawlessly overnight.

Probably should have tried that before ordering a new mainboard hey.  :)

 

Thanks again for the replies.

Wow, never heard of a CPU replacement fixing something like that, not something I would have ever suggested.  But glad you found the problem.  Are you sure system wasn't overclocked?

Link to comment
  • 4 years later...

Im having the same issue but mine crashes at basic tasks such as copying files to the array. The curious thing is I have the same setup as this one, 

 

AMD Phenom II X4 955, ASUS M5A97 EVO, 8gb Ram, 120gb SSD cache

I've been trying to troubleshoot this for the past month to no success... I've memtested for 48 hours no issues, disabled all the containers and still I get this issue, It will hang for like 10 min which causes the transfers to fail... 

 

does anyone have any idea other than replacing hardware since the ide was to reuse this old PC I have laying around? 

 

Last log has this as well:

Jan 23 16:25:05 SevenServer kernel: Hangcheck: hangcheck value past margin!
Jan 23 16:25:05 SevenServer kernel: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
Jan 23 16:25:05 SevenServer kernel: clocksource: 'hpet' wd_now: 25c5bf97 wd_last: 25c1ca69 mask: ffffffff
Jan 23 16:25:05 SevenServer kernel: clocksource: 'tsc' cs_now: 49b1c186cb3 cs_last: 3badb33d9f3 mask: ffffffffffffffff
Jan 23 16:25:05 SevenServer kernel: tsc: Marking TSC unstable due to clocksource watchdog
Jan 23 16:25:05 SevenServer kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Jan 23 16:25:05 SevenServer kernel: sched_clock: Marking unstable (1544152593065, 7185071)<-(1544252253552, -90945093)
Jan 23 16:25:05 SevenServer kernel: clocksource: Switched to clocksource hpet

Link to comment

LEaving my solution in case someone is having the same issue as I was. 

 

I deactivated all power saving options on my bios and one by one started to enable them until i got a crash. The culprit seems to be C1E options which if i understoop properly will allow for savings in CPU when not doing full load. This took me a month to pinpoint so i hope is useful to someone. 

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.