kno Posted September 14, 2017 Share Posted September 14, 2017 (edited) I have shrunk my unraid setup and removed an old drive with errors. Since I have added new bigger harddrives I do not need to replace it with a new disk as my storage needs are covered. Thus removing it seemed like the best option. I followed the instructions. Now, new parity sync has just started and already after a few minutes I have had errors on three drives (two new drives and one that is a couple of months old). Obviously I am concerned. Should I be or is it normal that new drives have minor issues in a sector or two? I have just recovered from drive issues with my old hdd's. I solved this with you good help. Now I am again concerned. Also, is it safe to run SMART short and extended tests during parity rebuild? Edit: Diagnostics deleted. Edited September 27, 2017 by kno Quote Link to comment
JorgeB Posted September 14, 2017 Share Posted September 14, 2017 1 hour ago, kno said: Should I be or is it normal that new drives have minor issues in a sector or two? No, run an extended test on those disks. Quote Link to comment
kno Posted September 14, 2017 Author Share Posted September 14, 2017 5 hours ago, johnnie.black said: No, run an extended test on those disks. Can I run an extended test on the drives while parity is being rebuilt? Is that safe? The three drives in question are not the parity drive, but they are being read to write the parity. Quote Link to comment
JorgeB Posted September 14, 2017 Share Posted September 14, 2017 No, but the parity is going to be corrupt, no point in letting it finish. Quote Link to comment
kno Posted September 14, 2017 Author Share Posted September 14, 2017 Just now, johnnie.black said: No, but the parity is going to be corrupt, no point in letting it finish. Ok, so cancel the parity rebuild and run extended SMART test is the correct way forward? Quote Link to comment
JorgeB Posted September 14, 2017 Share Posted September 14, 2017 Yes, to see if the problem is with the disks or some other issue. Quote Link to comment
kno Posted September 14, 2017 Author Share Posted September 14, 2017 Can I run extended SMART test on multiple drives at the same time or should I way intil one has finished before I start the next one? Quote Link to comment
JorgeB Posted September 14, 2017 Share Posted September 14, 2017 You can run all at the same time. Quote Link to comment
kno Posted September 15, 2017 Author Share Posted September 15, 2017 (edited) Here are the SMART reports: Edit: Report deleted. Edited September 27, 2017 by kno Quote Link to comment
JorgeB Posted September 15, 2017 Share Posted September 15, 2017 All disks passed the extended test, the issue is elsewhere, quind of expected since it's rare to have multiple failures at the same time, but it's good to rule them out. Problem is most likely cable related, but it could also be a bad/failing power supply, controller, etc, start by checking/replacing all cables and then start a new parity sync. Quote Link to comment
kno Posted September 25, 2017 Author Share Posted September 25, 2017 (edited) I was out travelling for a week, so I could not do any work on the Unraid server. Yesterday evening I swapped out the cables for the disks in question (1, 2 and 5) with new 6 GB/s cables with locking ability (came with a new Asus motherboard). The old cables were only marked with serial ATA and no bandwidth (standard red/orange cables that came with old motherboard). The old cables are probably lower bandwidth. The controllers in the Unraid server does not support 6 GB/s bandwidth, so the cables “should” not be the problem, but there might be issues especially since it is the new drives that are giving me troubles. I started parity rebuild. No errors occurred during the first hours. Last time errors occurred within 30-45 minutes. This morning one of the drives did report 128 errors, but this is probably not in the same position as the last time. I have attached the diagnostics. The read errors are at 04:40 in the log. I can see a lot of other errors as well. These are related to the drive that has been unassigned from the array. The drive is still plugged into the SATA controller. What do these errors mean? Based on this new information, what do you think can be the problem now? What should be my new action? -Abort parity rebuild (I guess the parity will still be corrupt due to read error on disk 5)? -Move disk 5 to another SATA port? -Change out PSU? I think I have another in storage. What wattage should the PSU ideally be for a 11 disk server? -Change out the SATA controllers (this is probably an expensive option, so hopefully to be avoided)? -Other suggestions? Edit: Diagnostics deleted. Edited September 27, 2017 by kno Quote Link to comment
JorgeB Posted September 25, 2017 Share Posted September 25, 2017 2 hours ago, kno said: Based on this new information, what do you think can be the problem now? What should be my new action? -Abort parity rebuild (I guess the parity will still be corrupt due to read error on disk 5)? -Move disk 5 to another SATA port? -Change out PSU? I think I have another in storage. What wattage should the PSU ideally be for a 11 disk server? -Change out the SATA controllers (this is probably an expensive option, so hopefully to be avoided)? -Other suggestions? -disconnect the unassigned bad disk -move/swap disk5 to a port on another controller -could be PSU, for 11 disks I'd say a 500/550W Corsair/Seasonic would be good -a LSI controller to replace the Sil3124 would also be good, though in the previous run one the disks affect was on the onboard controller, so it may not help with the errors, it would help with performance. -there's a theory still under testing that 6TB WD Reds model WD60EFRX-68L0BN1 have a firmware issue that causes errors during heavy activity, these errors appear mainly on these disks but it can also cause errors on different model disks when one of these is in use. Quote Link to comment
Vr2Io Posted September 25, 2017 Share Posted September 25, 2017 1st you pls enable SMART monitoring the counter "199", then you counld ASAP notice abnormal happen. PS : those counter can't clear. Quote Link to comment
kno Posted September 25, 2017 Author Share Posted September 25, 2017 1 hour ago, Benson said: 1st you pls enable SMART monitoring the counter "199", then you counld ASAP notice abnormal happen. PS : those counter can't clear. I am not sure what that means. 199 is one of the SMART error logging messages right? Isn't this being logged already? Under attributes I can see: 199 UDMA CRC error count 0x0032 200 200 000 Old age Always Never 521 I opened the computer case and removed my old disk 4. I also noticed that one of the two black grounding cables on the SATA power cord to goes to disk 5 was loose. I exchanged the cable for a new one. I do not know if this can have caused the error. I also moved disk 5 to another SATA port on another controller. Now I am going to try a new parity rebuild. Quote Link to comment
JorgeB Posted September 25, 2017 Share Posted September 25, 2017 He means you should add 199 to the monitored attributes: Settings -> Disk Settings -> Global SMART Settings -> Default SMART attribute notifications: If this attribute increases by 2 or more in a short period of time it usually means there's a bad SATA cable, but if there are old errors it will show them, as it never resets, you can acknowledge it on the dashboard. Quote Link to comment
kno Posted September 25, 2017 Author Share Posted September 25, 2017 Ok. 199 warning added. From what I understand this is checking if data has been transferred correctly or if the data needs to be resent to the HDD? I guess a few UDMA errors should be expected, right? This is not really a problem as the data will be resent. However, if many errors occur this indicates continuous bad transfers, so something is wrong with the transmission, such a cable bending/breakage, EMI/RFI interference, damage to the cable causing reduction in bandwidth, etc. Correct? Quote Link to comment
JorgeB Posted September 25, 2017 Share Posted September 25, 2017 10 minutes ago, kno said: I guess a few UDMA errors should be expected, right? Not really, most of my disks, including several with years of use, have 0 errors, 1 error once in a while it's OK, anymore that that there's a problem, usually the SATA cable. Quote Link to comment
Vr2Io Posted September 25, 2017 Share Posted September 25, 2017 Thanks @johnnie.black explanation. 12 minutes ago, kno said: I guess a few UDMA errors should be expected, right? No, even 1 error means something wrong. My one also zero error for 2yrs+. But I also got problem in past due to controller issue, change it and no problem again. Quote Link to comment
JorgeB Posted September 25, 2017 Share Posted September 25, 2017 I'd say 1 once in a while it's OK since no cable is perfect, but not more. Quote Link to comment
kno Posted September 25, 2017 Author Share Posted September 25, 2017 Ok, very interesting. I did a check of all my disks. All, but three have 0 errors. Disk 5: 521 Disk 7: 17 Disk 8: 16 This indicated that I have had some trouble earlier. I cannot explain disk 7 or 8. Disk 5 was the drive that had the most errors before I changed it last month. Disk 5 is almost new, so there must have been some serious problem with cable or controller. I will keep an eye on these values. I am trying to build new parity now. It will take a while. Quote Link to comment
kno Posted September 27, 2017 Author Share Posted September 27, 2017 Success, parity rebuilt without any errors on the disks. My best guess is it was a cable problem or a combination of SATA cables, SATA port and power cable problem. Either way it seems to work now. Thank you for the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.