Disk Error Help


Recommended Posts

I was copying files off of Disk 10 on my server to start the process of updating the drive format.  I glanced at the unRaid main page and found Drive 3 was disabled with a rediculous number of drive writes.  See below image.

 

I couldn't get a smart report and the drive wouldn't spin up.  So I stopped the array and found the drive was not even showing up in the dropdown as being attached.  OK probably a loose cable.

 

I shut down opened the case.  the cables for drives 1 and 2 felt loose (they dont latch) but everything else felt solid so back together and i restarted.

 

Drive 3 still showing "Disabled" status, but this time I was able to get a short smart test run  (attached).  Nothing caught my eye.

 

The drive looks good but I can't seem to get it to "enable" wit the array started.  So here is where I probably screwed up.  I unassigned the drive and started the array, not thinking after stopping the array, it now sees it as a new drive, but it is showing up fine.

 

With the array started I can access the drive (emulated) and played a couple files which doesnt really tell me much.  I have no idea what all those writes were from or what triggered them.  I'm in the middle of an extended self test on the drive and will post the results here shortly. 

 

Should I just re-assign the drive and see if it will re-build?  Is there any way to recover the old configuration or would you not trust the data on it to run a parity check?

 

Thoughts on the best way forward?  FYI, if i lost data, this is the one drive in the box I would not loose sleep over. 

 

One thought is to use one of my warm spares, rebuild the data and then preclear this drive before re-using it. 

 

Thoughts?  Thanks in advance.

 

Disk_3.JPG.ad673689046d90b59edcadc625141244.JPG

ST3000DM001-1CH166_W1F29KYY-20160819-2328.txt

Link to comment

What you did is actually the only way to re enable a disabled disk.  Because a write to the drive failed (why it got disabled) the contents of that drive are out of sync as far as a parity check operation is concerned.  You have to rebuild that drive.

 

Sent from my LG-D852 using Tapatalk

 

 

Link to comment

What you did is actually the only way to re enable a disabled disk.  Because a write to the drive failed (why it got disabled) the contents of that drive are out of sync as far as a parity check operation is concerned.  You have to rebuild that drive.

 

Sent from my LG-D852 using Tapatalk

 

would you trust the drive without a new preclear if the smart test looks ok?  or use a new drive an pre-clear that one again? 

 

would simply having a network drive mapped to that location and booting up the computer cause a write operation?  Otherwise i can't think f any reason it would have starte a write.  I'm assuming the really high write count is re-write attempts?

Link to comment

Results after extended test attached.

 

What do these numbers mean and why are they so high?

 

  1 Raw_Read_Error_Rate    0x000f  119  099  006    Pre-fail  Always      -      212179552

  7 Seek_Error_Rate        0x000f  069  060  030    Pre-fail  Always      -      7693004

 

So you would trust this drive and go ahead and rebuild it? 

 

The cables that were loose were to drives 1 and 2.  Drive 3 is the one with errors, so i'm still puzzled over what caused this.  All i did was pres on cables that felt perfectly seated to drive 3.

ST3000DM001-1CH166_W1F29KYY-20160820-0031.txt

Link to comment

Raw Read Error Rate and Seek Error Rate's Raw values are meaningless to anyone other than Seagate.

 

In this case, the value vs the threshold is what you look at, and on both of them they are nowhere near the threshold.

 

Thanks, recovery complete and all seems fine.  Still a little puzzles about the origin of the trouble that took the drive offline as the cables felt seated, I'm assuming it must have been an cable issue though. 

 

 

Link to comment
  • 9 months later...

Resurrecting this old thread.  

 

10 months later, this same drive red-balls again.  I methodically tested all the cables and certain it is not a cable issue. After all my testing, I could not get the drive to run a smart test or show any results.  I was attempting this from inside the drive window (clicking on the drive from the main page).

 

So finally I went ahead and rebuilt the drive using my warm spare in the box.  This morning with the re-build complete, I again have a blue square new drive showing up as an assigned drive.  I went into the drive page, which shows no smart test history, but again was not able to get the drive to actually run a smart test.   However, I was able to get a pre-clear cycle to start and so curious what it will turn up.   The problem (if any) should show up in the 2nd part of the operation, correct?  during the write zeros operation?

 

It is still connected to same power/sata cable it was connected to when the trouble occurred.

 

Any thoughts?  Why would it not run a Smart test, but would run a pre-clear?  

Edited by TODDLT
Link to comment
On 6/11/2017 at 11:42 AM, TODDLT said:

Resurrecting this old thread.  

 

10 months later, this same drive red-balls again.  I methodically tested all the cables and certain it is not a cable issue. After all my testing, I could not get the drive to run a smart test or show any results.  I was attempting this from inside the drive window (clicking on the drive from the main page).

 

So finally I went ahead and rebuilt the drive using my warm spare in the box.  This morning with the re-build complete, I again have a blue square new drive showing up as an assigned drive.  I went into the drive page, which shows no smart test history, but again was not able to get the drive to actually run a smart test.   However, I was able to get a pre-clear cycle to start and so curious what it will turn up.   The problem (if any) should show up in the 2nd part of the operation, correct?  during the write zeros operation?

 

It is still connected to same power/sata cable it was connected to when the trouble occurred.

 

Any thoughts?  Why would it not run a Smart test, but would run a pre-clear?  

 

So the preclear completed successfully.   However, I still can't seem to get a smart report.  When I go to the drive page, I can't get any action from clicking on the button to start a short test.

 

Any help?

 

This is all i can get:

 

############################################################################################################################
#                                                                                                                          #
#                                         unRAID Server Preclear of disk W1F29KYY                                          #
#                                       Cycle 1 of 1, partition start on sector 64.                                        #
#                                                                                                                          #
#                                                                                                                          #
#   Step 1 of 5 - Pre-read verification:                                                   [5:11:09 @ 160 MB/s] SUCCESS    #
#   Step 2 of 5 - Zeroing the disk:                                                        [5:10:52 @ 160 MB/s] SUCCESS    #
#   Step 3 of 5 - Writing unRAID's Preclear signature:                                                          SUCCESS    #
#   Step 4 of 5 - Verifying unRAID's Preclear signature:                                                        SUCCESS    #
#   Step 5 of 5 - Post-Read verification:                                                  [5:10:38 @ 161 MB/s] SUCCESS    #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#                              Cycle elapsed time: 15:32:42 | Total elapsed time: 15:32:42                                 #
############################################################################################################################

--> RESULT: Preclear Finished Successfully!.


root@TODD-Svr:/usr/local/emhttp#
Link to comment

The command that finally worked:

smartctl --smart=on /dev/sdd

-s was invalid (possibly a new version?)

 

Log attached.   Nothing stands out here but I'm no expert. I did just tell it to run an extended test, not sure if it changes much.

 

Why would this drive redball, again with no apparent reason?

 

Thanks again all.

ST3000DM001-1CH166_W1F29KYY-20170613-0358.txt

Edited by TODDLT
Link to comment

That UDMA error means that the drive is not well connected to the server. normally means that a hand has entered the delicate wiring and knocked something loose. An actual bad cable is also possible.

 

Drive cages and locking cables (where they are supported) are recommended.

Link to comment
3 hours ago, johnnie.black said:

This is not a very good sign:


183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3

These mean there is (or there was) a very bad SATA cable:


199 UDMA_CRC_Error_Count    0x003e   200   195   000    Old_age   Always       -       22663

 

 

Both of those numbers were exactly the same in the attached SMART report at the top frame of this post from August of last year when this occurred the first time.  So that would suggest this was not a SATA cable isssue at least, or the number would have been higher from a 2nd occurrence, correct?

 

What about the Runtime Bad Block error?  that is not part of a cabling issue correct?  Should I be worried about that?

 

3 hours ago, Fireball3 said:

Did you try replacing the SATA cable?

 

I can tonight if we really think that is the culprit.  However based on the CRC Error count not going up, it seems this is not a cable issue.  I'm still faced with the same question about trusting this drive back in circulation or not. (or as a spare)

 

1 hour ago, bjp999 said:

Drive cages and locking cables (where they are supported) are recommended.

 

The drives are in the original Antec 1200 drive cages.  I do use locking cables.  However, the two SSD's are in expansion slots that don't take locking cables, and the stacked design of the MB ports means the lower cable latch is essentially depressed by the upper cable, so it's not very effective latched on that end.  The top cable latches fine.  (bad design).  The cables all latch on the drive end.  

 

 

Any thoughts?  

 

Edited by TODDLT
Link to comment
26 minutes ago, TODDLT said:

So that would suggest this was not a SATA cable isssue at least, or the number would have been higher from a 2nd occurrence, correct?

 

Correct

 

26 minutes ago, TODDLT said:

What about the Runtime Bad Block error?  that is not part of a cabling issue correct?  Should I be worried about that?

 

Not cable related, on Seagtes these are usually bad blocks and not as good sign, but if it's stable disk may still have some life in it.

Edited by johnnie.black
Link to comment
2 minutes ago, johnnie.black said:

 

Not cable related, on Seagtes these are usually bad blocks and not as good sign, but if it's stable disk may still have some life in it.

 

Well maybe stable from the standpoint that the bad blocks haven't increased in the past 10 months.  However, red-balling twice now isn't that stable.

 

It is one of the newer drives in the array.  When these drives first came out the ratings were initially high on NewEgg, they eventually went down with people complaining of higher failure rates.

 

On that note maybe I should go ahead and take this one out of service.  One of my parity drives is this model and I still have two other data drives of the same model.  I have a couple Toshiba's and now use the VN series instead of DM from Seagate.

Link to comment

Sometimes these types of drives, that have some SMART issues but still in pretty good shape, can be used as a backup device. Better than nothing, and if it is not running 24x7, might work very well for limited use.

 

Also, if it is not that old, remember many credit cards offer an extended warranty (extra 1-2 years).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.