Vova Posted June 19, 2017 Share Posted June 19, 2017 Hi. after being on a vacation with my unraid box turned on 24/7 today i've noticed that one of the data HDD was put in a disabled state. there were 187 errors on the Main tab in UI. Also, what was surprising is that the write count on this drive was very huge. i'm talking about very big number like 18,000,000,000,000,000 or even more zeroes. Unfortunately, i've stopped/starter the array w/o taking the screenshot before and that counters were zeroed out. I have a question - why did i have such a big number of writes for this HDD? i assume that because of this HUGE number of writes the drive has failed. attaching the diag file. Please help to identify the root cause. tower-diagnostics-20170619-1516.zip Quote Link to comment
JorgeB Posted June 19, 2017 Share Posted June 19, 2017 56 minutes ago, Vova said: I have a question - why did i have such a big number of writes for this HDD? That huge number is the result of the disk dropping offline, not the cause, unfortunately your syslog is filled with: Jun 18 10:40:02 Tower kernel: aacraid 0000:08:0e.0: AAC0:aac_check_health: Host adapter dead -1 Jun 18 10:40:03 Tower kernel: aacraid 0000:08:0e.0: AAC0:aac_check_health: Host adapter dead -1 so we can't see what happened, but SMART looks fine so check/replace cables and rebuild to the same disk. Quote Link to comment
Vova Posted June 19, 2017 Author Share Posted June 19, 2017 this adapter is empty, not connected to any drive. can you please point me to the instruction how to rebuild to the same disk? Is this what do i need to do? https://wiki.lime-technology.com/Replacing_a_Data_Drive Quote Link to comment
JorgeB Posted June 19, 2017 Share Posted June 19, 2017 https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive Quote Link to comment
Vova Posted June 21, 2017 Author Share Posted June 21, 2017 Thank you. i've removed the Adaptec controller which was rubbishing the logs. after that i've run the rebuild procedure and seems the drive is flapping. i'm attaching the log. please give me some clues here. should i replace the cable/backplane? i've seen very similar errors on completely other drives on this Microserver G8 on Centos 7 before (2TB WD SEs) tower-diagnostics-20170621-1421.zip Quote Link to comment
JorgeB Posted June 21, 2017 Share Posted June 21, 2017 Looks to me more like a disk problem rather than cable/backpane issue, but trade backplanes with another disk, if it fails again it's the disk. Quote Link to comment
Vova Posted June 22, 2017 Author Share Posted June 22, 2017 Johhnie, given that i've seen the same errors previously with centos on completely other disks - does it provide any valuable info to reconsider the verdict? Quote Link to comment
JorgeB Posted June 22, 2017 Share Posted June 22, 2017 It's easy to confirm by using the disk in a different backplane, unless there's a general problem with the server/controller. Quote Link to comment
Vova Posted June 26, 2017 Author Share Posted June 26, 2017 (edited) On 6/22/2017 at 0:10 PM, johnnie.black said: It's easy to confirm by using the disk in a different backplane, unless there's a general problem with the server/controller. so, i've attached this drive to another port of the same controller, not via the backplane but via standard SATA cable. B120i has 6 SATA ports (https://www.hpe.com/h20195/v2/gethtml.aspx?docname=c04168333). 4 of them are connected to the backplane, and 1 is available as a separate one. After this i've initiated rebuild of the failed drive. rebuild succeeded. Should i assume that the backplane is faulty? is there a solid way to verify this assumption? Thanks for the time you spend to help us here! Edited June 26, 2017 by Vova Quote Link to comment
JorgeB Posted June 26, 2017 Share Posted June 26, 2017 Test for a few more days, intermittent issues can be hard do confirm, time and multiple tests can help. Quote Link to comment
Vova Posted June 27, 2017 Author Share Posted June 27, 2017 Thanks Johnnie, i will leave it as it is for 1 month then. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.