Older version of Unraid - What to do with Disk Errors


Recommended Posts

I've been a dedicated Unraid user for years- that being said I am running unRAID Server Pro version: 5.0-rc16c.. (my motto: if it aint broke don't fix it :))

 

I have a 3TB parity drive and 4x3TB data drives.  I noticed today 55 errors on Disk 3. From the log, it appears this happened on 2/18.  Here is a cut from syslog:

 

Feb 18 20:25:47 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 20:25:47 Tower kernel: ata4.00: BMDMA stat 0x25
Feb 18 20:25:47 Tower kernel: ata4.00: failed command: READ DMA EXT
Feb 18 20:25:47 Tower kernel: ata4.00: cmd 25/00:00:68:ca:04/00:04:5d:01:00/e0 tag 0 dma 524288 in
Feb 18 20:25:47 Tower kernel:          res 51/40:af:b0:cc:04/40:01:5d:01:00/e0 Emask 0x9 (media error)
Feb 18 20:25:47 Tower kernel: ata4.00: status: { DRDY ERR }
Feb 18 20:25:47 Tower kernel: ata4.00: error: { UNC }
Feb 18 20:25:47 Tower kernel: ata4.00: configured for UDMA/133
Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Unhandled sense code
Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf]  
Feb 18 20:25:47 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08
Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf]  
Feb 18 20:25:47 Tower kernel: Sense Key : 0x3 [current] [descriptor]
Feb 18 20:25:47 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Feb 18 20:25:47 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 
Feb 18 20:25:47 Tower kernel:         5d 04 cc b0 
Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf]  
Feb 18 20:25:47 Tower kernel: ASC=0x11 ASCQ=0x4
Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] CDB: 
Feb 18 20:25:47 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 5d 04 ca 68 00 00 04 00 00 00
Feb 18 20:25:47 Tower kernel: end_request: I/O error, dev sdf, sector 5855562928
Feb 18 20:25:47 Tower kernel: ata4: EH complete
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562864
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562872
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562880
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562888
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562896
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562904
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562912
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562920
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562928
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562936
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562944
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562952
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562960
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562968
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562976
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562984
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562992
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563000
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563008
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563016
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563024
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563032
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563040
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563048
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563056
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563064
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563072
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563080
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563088
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563096
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563104
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563112
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563120
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563128
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563136
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563144
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563152
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563160
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563168
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563176
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563184
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563192
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563200
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563208
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563216
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563224
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563232
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563240
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563248
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563256
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563264
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563272
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563280
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563288
Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563296
Feb 18 21:41:31 Tower kernel: mdcmd (108): spindown 0
Feb 18 22:28:02 Tower kernel: mdcmd (109): spindown 2
Feb 18 22:29:23 Tower kernel: mdcmd (110): spindown 1
Feb 18 22:29:33 Tower kernel: mdcmd (111): spindown 4
Feb 19 00:37:15 Tower kernel: mdcmd (112): spindown 3

 

Not being all that familiar with what to do when things go wrong, what would be my next steps other than panic?

1) Copy all data to another drive, on or off the array?

2) Run a parity check?  ... last parity check was about a year ago

....Check the box: Correct any Parity-Check errors by writing the Parity disk with corrected parity?

4) Replace drive and rebuild?

 

Thanks!

Link to comment

Should the replacement drive be precleared?  I thought I read a post stating the only while expanding the array they need to be precleared.  What if I am replacing due to currently installed drive showing signs of failure?  Is the pre-clear necessary?  Or will it happen automatically?  (I am running 5.0-rc16c)

 

My thinking is to pre-clear in a spare PC I have (boot from unRaid USB), then swap out bad with the pre-cleared good-

 

(I am trying to prevent array down time, as I have visitors this weekend)

Link to comment

In this case the only purpose of preclear would be to test the disk. There are other ways to test the disk before use, including the manufacturer's diagnostics. And a successful rebuild followed by a successful noncorrecting parity check would be a pretty good test itself.

Link to comment

Purchased a WD Red to replace the failing WD Green.  Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check-  Everything tested OK.  Followed instructions to replace drive-  When re-assigning, get red bubble with "disc 3, Wrong."  In Array Status: "Stopped.  Replacement disk is too small."

 

unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard.

 

Reading forums, trying to figure out where to go from here without messing up my data

 

Link to comment

From forum:

hdparm -N /dev/sdf results in

 

/dev/sdf:

max sectors = 5860531055/5860533168, HPA is enabled

 

On all other discs HPA is disabled

 

It looks like this would solve the problem: 

 

hdparm -N  p5860533168 /dev/sdf

 

Or is there a safer way?

 

 

Edited by WannaTheater
Link to comment
7 minutes ago, WannaTheater said:

Purchased a WD Red to replace the failing WD Green.  Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check-  Everything tested OK.  Followed instructions to replace drive-  When re-assigning, get red bubble with "disc 3, Wrong."  In Array Status: "Stopped.  Replacement disk is too small."

 

unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard.

 

Reading forums, trying to figure out where to go from here without messing up my data

 

You must use a replacement disk at least as large as the original.

Link to comment

Command took successfully

I unassigned the device (missing)

Then reassigned- Still red bubble, and "replacement disc too small"

Unassigned again, powered down, powered back on.

Assigned new drive

Red bubble still says WRONG, but Array is stopped and now it looks like I can move forward.

Currently rebuilding- 

 

Thank you!

 

 

Edited by WannaTheater
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.