Server unresponsive, now multiple errors


Recommended Posts

I have had a few problems recently with the server failing ie I cannot access from the network but the console is responsive. But a reboot or shutdown will not work, it says the system is going into shutdown but never completes.

 

I have been trying to diagnose but this happened again today and I had to hard reset and the parity check started. However 7 drives are now showing errors and the check has stopped. I have been running "fix common problems" in troubleshoot mode and I attach the diagnostic.

 

Could someone please advise as to what is my next step without losing everything.

tower-diagnostics-20170421-1808.zip

Edited by ridley
Link to comment

Bump

 

Any ideas?

 

After reseating the cables, I restarted the server and started a read check. It is showing millions of errors on 7 drives (maybe 8 if parity was included). The number of errors on each drive is nearly identical so I cannot see it being all of the drives, RAID card? 

Link to comment

All disks but one look fine, disk5 show a single pending sector, possibly a false positive and unrelated to latest errors but you should run an extended SMART test to confirm.

 

Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA2084939

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

Parity getting disabled and the multiple disk errors were the result of one of the SASLPs crashing (the one with 8 disks connected), this a somewhat common issue (with both the SASLP and the SAS2LP) although it only affects a small number of users.

 

Some things that can help with that is disabling vt-d if not needed, board bios update or using the controller in a different slot if available.

 

You can cancel the read check you're doing and might at well start a parity sync.

Link to comment
47 minutes ago, ridley said:

Are you sure this is necessary?

 

Can't be sure it will fix your problem, but it's without a doubt the #1 cause for an unresponsive server on v6.

 

Also, reiserfs is on the way out, it's badly supported by the maintainers and there's been a few serious issues lately, besides, performance can be terrible in certain situations, so I would recommend converting even if you didn't have problems.

  • Upvote 1
Link to comment
1 minute ago, johnnie.black said:

 

Can't be sure it will fix your problem, but it's without a doubt the #1 cause for an unresponsive server on v6.

 

Also, reiserfs is on the way out, it's badly supported by the maintainers and there's been a few serious issues lately, besides, performance can be terrible in certain situations, so I would recommend converting even if you didn't have problems.

 

The strange thing is that the server doesnt become unresponsive, you can still use the console or telnet in. It just will not shutdown/reboot.

 

I have a syslog from the last time it became unresponsive I can upload that if it helps.

Link to comment
On 4/22/2017 at 2:10 PM, johnnie.black said:

All disks but one look fine, disk5 show a single pending sector, possibly a false positive and unrelated to latest errors but you should run an extended SMART test to confirm.

 


Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA2084939

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

Parity getting disabled and the multiple disk errors were the result of one of the SASLPs crashing (the one with 8 disks connected), this a somewhat common issue (with both the SASLP and the SAS2LP) although it only affects a small number of users.

 

Some things that can help with that is disabling vt-d if not needed, board bios update or using the controller in a different slot if available.

 

You can cancel the read check you're doing and might at well start a parity sync.

 

 

OK Checked motherboard BIOS and there is no VT-D option. Have checked the BIOS's on the SASLP and they both have int13 disabled and are running 3.1.0.15n

 

I have not more PCI-e slots so could only swap them around.

 

What to do now? It appears that the SASLP card keeps crashing and I lose connection to multiple (8?) drives.

 

Suggestions please.

Link to comment
4 minutes ago, ridley said:

OK Checked motherboard BIOS and there is no VT-D option.

 

Pretty sure it has to exist, check the manual, though it may be disable by default, you can check the status on unRAID's main page, click on info on the top right and check if IOMMU is enable or disable.

 

5 minutes ago, ridley said:

What to do now?

 

Get another controller, at the moment LSI are the ones that work best.

Link to comment
1 minute ago, johnnie.black said:

 

Pretty sure it has to exist, check the manual, though it may be disable by default, you can check the status on unRAID's main page, click on info on the top right and check if IOMMU is enable or disable.

 

 

Get another controller, at the moment LSI are the ones that work best.

 

LSI Model number?

Link to comment
1 minute ago, johnnie.black said:

Most popular are the 9210-8i and 9211-8i, or clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed to IT mode to work with unRAID but are generally a bit cheaper.

 

 

Not the SAS 8308ELP then?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.