TODDLT Posted August 20, 2016 Share Posted August 20, 2016 I was copying files off of Disk 10 on my server to start the process of updating the drive format. I glanced at the unRaid main page and found Drive 3 was disabled with a rediculous number of drive writes. See below image. I couldn't get a smart report and the drive wouldn't spin up. So I stopped the array and found the drive was not even showing up in the dropdown as being attached. OK probably a loose cable. I shut down opened the case. the cables for drives 1 and 2 felt loose (they dont latch) but everything else felt solid so back together and i restarted. Drive 3 still showing "Disabled" status, but this time I was able to get a short smart test run (attached). Nothing caught my eye. The drive looks good but I can't seem to get it to "enable" wit the array started. So here is where I probably screwed up. I unassigned the drive and started the array, not thinking after stopping the array, it now sees it as a new drive, but it is showing up fine. With the array started I can access the drive (emulated) and played a couple files which doesnt really tell me much. I have no idea what all those writes were from or what triggered them. I'm in the middle of an extended self test on the drive and will post the results here shortly. Should I just re-assign the drive and see if it will re-build? Is there any way to recover the old configuration or would you not trust the data on it to run a parity check? Thoughts on the best way forward? FYI, if i lost data, this is the one drive in the box I would not loose sleep over. One thought is to use one of my warm spares, rebuild the data and then preclear this drive before re-using it. Thoughts? Thanks in advance. ST3000DM001-1CH166_W1F29KYY-20160819-2328.txt Quote Link to comment
Squid Posted August 20, 2016 Share Posted August 20, 2016 What you did is actually the only way to re enable a disabled disk. Because a write to the drive failed (why it got disabled) the contents of that drive are out of sync as far as a parity check operation is concerned. You have to rebuild that drive. Sent from my LG-D852 using Tapatalk Quote Link to comment
TODDLT Posted August 20, 2016 Author Share Posted August 20, 2016 What you did is actually the only way to re enable a disabled disk. Because a write to the drive failed (why it got disabled) the contents of that drive are out of sync as far as a parity check operation is concerned. You have to rebuild that drive. Sent from my LG-D852 using Tapatalk would you trust the drive without a new preclear if the smart test looks ok? or use a new drive an pre-clear that one again? would simply having a network drive mapped to that location and booting up the computer cause a write operation? Otherwise i can't think f any reason it would have starte a write. I'm assuming the really high write count is re-write attempts? Quote Link to comment
Squid Posted August 20, 2016 Share Posted August 20, 2016 Smart looks good except for the 25k crc errors which was cable related. Sent from my LG-D852 using Tapatalk Quote Link to comment
TODDLT Posted August 20, 2016 Author Share Posted August 20, 2016 Results after extended test attached. What do these numbers mean and why are they so high? 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 212179552 7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 7693004 So you would trust this drive and go ahead and rebuild it? The cables that were loose were to drives 1 and 2. Drive 3 is the one with errors, so i'm still puzzled over what caused this. All i did was pres on cables that felt perfectly seated to drive 3. ST3000DM001-1CH166_W1F29KYY-20160820-0031.txt Quote Link to comment
Squid Posted August 20, 2016 Share Posted August 20, 2016 Raw Read Error Rate and Seek Error Rate's Raw values are meaningless to anyone other than Seagate. In this case, the value vs the threshold is what you look at, and on both of them they are nowhere near the threshold. Quote Link to comment
TODDLT Posted August 21, 2016 Author Share Posted August 21, 2016 Raw Read Error Rate and Seek Error Rate's Raw values are meaningless to anyone other than Seagate. In this case, the value vs the threshold is what you look at, and on both of them they are nowhere near the threshold. Thanks, recovery complete and all seems fine. Still a little puzzles about the origin of the trouble that took the drive offline as the cables felt seated, I'm assuming it must have been an cable issue though. Quote Link to comment
TODDLT Posted June 11, 2017 Author Share Posted June 11, 2017 (edited) Resurrecting this old thread. 10 months later, this same drive red-balls again. I methodically tested all the cables and certain it is not a cable issue. After all my testing, I could not get the drive to run a smart test or show any results. I was attempting this from inside the drive window (clicking on the drive from the main page). So finally I went ahead and rebuilt the drive using my warm spare in the box. This morning with the re-build complete, I again have a blue square new drive showing up as an assigned drive. I went into the drive page, which shows no smart test history, but again was not able to get the drive to actually run a smart test. However, I was able to get a pre-clear cycle to start and so curious what it will turn up. The problem (if any) should show up in the 2nd part of the operation, correct? during the write zeros operation? It is still connected to same power/sata cable it was connected to when the trouble occurred. Any thoughts? Why would it not run a Smart test, but would run a pre-clear? Edited June 11, 2017 by TODDLT Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 On 6/11/2017 at 11:42 AM, TODDLT said: Resurrecting this old thread. 10 months later, this same drive red-balls again. I methodically tested all the cables and certain it is not a cable issue. After all my testing, I could not get the drive to run a smart test or show any results. I was attempting this from inside the drive window (clicking on the drive from the main page). So finally I went ahead and rebuilt the drive using my warm spare in the box. This morning with the re-build complete, I again have a blue square new drive showing up as an assigned drive. I went into the drive page, which shows no smart test history, but again was not able to get the drive to actually run a smart test. However, I was able to get a pre-clear cycle to start and so curious what it will turn up. The problem (if any) should show up in the 2nd part of the operation, correct? during the write zeros operation? It is still connected to same power/sata cable it was connected to when the trouble occurred. Any thoughts? Why would it not run a Smart test, but would run a pre-clear? So the preclear completed successfully. However, I still can't seem to get a smart report. When I go to the drive page, I can't get any action from clicking on the button to start a short test. Any help? This is all i can get: ############################################################################################################################ # # # unRAID Server Preclear of disk W1F29KYY # # Cycle 1 of 1, partition start on sector 64. # # # # # # Step 1 of 5 - Pre-read verification: [5:11:09 @ 160 MB/s] SUCCESS # # Step 2 of 5 - Zeroing the disk: [5:10:52 @ 160 MB/s] SUCCESS # # Step 3 of 5 - Writing unRAID's Preclear signature: SUCCESS # # Step 4 of 5 - Verifying unRAID's Preclear signature: SUCCESS # # Step 5 of 5 - Post-Read verification: [5:10:38 @ 161 MB/s] SUCCESS # # # # # # # # # # # # # # # ############################################################################################################################ # Cycle elapsed time: 15:32:42 | Total elapsed time: 15:32:42 # ############################################################################################################################ --> RESULT: Preclear Finished Successfully!. root@TODD-Svr:/usr/local/emhttp# Quote Link to comment
JorgeB Posted June 13, 2017 Share Posted June 13, 2017 Try to get a SMART report on the console Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 10 minutes ago, johnnie.black said: Try to get a SMART report on the console This is the only screen I know how to get a Smart report on. Note at the bottom is says "Smart disabled"... how do you turn that back on? Quote Link to comment
Fireball3 Posted June 13, 2017 Share Posted June 13, 2017 Telnet into your server or log into console if you have a screen attached. Then type smartctl -s on /dev/your_drives_mount_point For more options type smartctl -h or see here. Quote Link to comment
JorgeB Posted June 13, 2017 Share Posted June 13, 2017 Like Fireball posted you just need to enable SMART, some disks come with SMART disable, unRAID enables it when the disk is part of the array, but not for unassigned disks. Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 (edited) The command that finally worked: smartctl --smart=on /dev/sdd -s was invalid (possibly a new version?) Log attached. Nothing stands out here but I'm no expert. I did just tell it to run an extended test, not sure if it changes much. Why would this drive redball, again with no apparent reason? Thanks again all. ST3000DM001-1CH166_W1F29KYY-20170613-0358.txt Edited June 13, 2017 by TODDLT Quote Link to comment
JorgeB Posted June 13, 2017 Share Posted June 13, 2017 This is not a very good sign: 183 Runtime_Bad_Block 0x0032 097 097 000 Old_age Always - 3 These mean there is (or there was) a very bad SATA cable: 199 UDMA_CRC_Error_Count 0x003e 200 195 000 Old_age Always - 22663 Quote Link to comment
Fireball3 Posted June 13, 2017 Share Posted June 13, 2017 Did you try replacing the SATA cable? Quote Link to comment
SSD Posted June 13, 2017 Share Posted June 13, 2017 That UDMA error means that the drive is not well connected to the server. normally means that a hand has entered the delicate wiring and knocked something loose. An actual bad cable is also possible. Drive cages and locking cables (where they are supported) are recommended. Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 (edited) 3 hours ago, johnnie.black said: This is not a very good sign: 183 Runtime_Bad_Block 0x0032 097 097 000 Old_age Always - 3 These mean there is (or there was) a very bad SATA cable: 199 UDMA_CRC_Error_Count 0x003e 200 195 000 Old_age Always - 22663 Both of those numbers were exactly the same in the attached SMART report at the top frame of this post from August of last year when this occurred the first time. So that would suggest this was not a SATA cable isssue at least, or the number would have been higher from a 2nd occurrence, correct? What about the Runtime Bad Block error? that is not part of a cabling issue correct? Should I be worried about that? 3 hours ago, Fireball3 said: Did you try replacing the SATA cable? I can tonight if we really think that is the culprit. However based on the CRC Error count not going up, it seems this is not a cable issue. I'm still faced with the same question about trusting this drive back in circulation or not. (or as a spare) 1 hour ago, bjp999 said: Drive cages and locking cables (where they are supported) are recommended. The drives are in the original Antec 1200 drive cages. I do use locking cables. However, the two SSD's are in expansion slots that don't take locking cables, and the stacked design of the MB ports means the lower cable latch is essentially depressed by the upper cable, so it's not very effective latched on that end. The top cable latches fine. (bad design). The cables all latch on the drive end. Any thoughts? Edited June 13, 2017 by TODDLT Quote Link to comment
JorgeB Posted June 13, 2017 Share Posted June 13, 2017 (edited) 26 minutes ago, TODDLT said: So that would suggest this was not a SATA cable isssue at least, or the number would have been higher from a 2nd occurrence, correct? Correct 26 minutes ago, TODDLT said: What about the Runtime Bad Block error? that is not part of a cabling issue correct? Should I be worried about that? Not cable related, on Seagtes these are usually bad blocks and not as good sign, but if it's stable disk may still have some life in it. Edited June 13, 2017 by johnnie.black Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 2 minutes ago, johnnie.black said: Not cable related, on Seagtes these are usually bad blocks and not as good sign, but if it's stable disk may still have some life in it. Well maybe stable from the standpoint that the bad blocks haven't increased in the past 10 months. However, red-balling twice now isn't that stable. It is one of the newer drives in the array. When these drives first came out the ratings were initially high on NewEgg, they eventually went down with people complaining of higher failure rates. On that note maybe I should go ahead and take this one out of service. One of my parity drives is this model and I still have two other data drives of the same model. I have a couple Toshiba's and now use the VN series instead of DM from Seagate. Quote Link to comment
JorgeB Posted June 13, 2017 Share Posted June 13, 2017 13 minutes ago, TODDLT said: On that note maybe I should go ahead and take this one out of service. Probably a good idea. Quote Link to comment
TODDLT Posted June 13, 2017 Author Share Posted June 13, 2017 54 minutes ago, johnnie.black said: Probably a good idea. Going to run extended self tests on all the DM model drives tonight and see if anything shows up. Quote Link to comment
SSD Posted June 14, 2017 Share Posted June 14, 2017 Sometimes these types of drives, that have some SMART issues but still in pretty good shape, can be used as a backup device. Better than nothing, and if it is not running 24x7, might work very well for limited use. Also, if it is not that old, remember many credit cards offer an extended warranty (extra 1-2 years). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.