Pending sector. Device disabled. What to do?

rjbaat · March 28, 2017

Hi All,

I have a problem with a 2TB WD red disk. It is unmountable suddenly.

This is never happend to me before, so i hope i can get some advice on what to do next. Without loosing data.

The smart result says there is a pending error. The disk is disabled in the array.

What can i do to get rid of the pending sector?

tower-diagnostics-20170328-2206.zip

trurl · March 28, 2017

Looks like you rebooted before taking the diagnostic so we can't see why disk1 was disabled. It looks like you also have filesystem corruption on disk1, possibly related.

Do you have a tested spare you can rebuild to?

JorgeB · March 28, 2017

You can start the checking the file system on the emulated disk1 (md1):

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

Do you have a spare for the rebuild?

rjbaat · March 28, 2017

yes i rebooted to check if it would come backup again :(.

But it didnt help. I just ordered a new 3TB WD red that is delivered tomorrow.

So then i can rebuild to that disk. Should i preclear the new one first?

rjbaat · March 28, 2017

2 minutes ago, johnnie.black said:

You can start the checking the file system on the emulated disk1 (md1):

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

Do you have a spare for the rebuild?

What would be prefered? To rebuild to a new disk tomorrow and shutdown the server for now?

I don't want to mess with it more then needed at this point.

trurl · March 28, 2017

Just now, rjbaat said:

What would be prefered? To rebuild to a new disk tomorrow and shutdown the server for now?

I don't want to mess with it more then needed at this point.

If you can shutdown until you get the new disk then you could get your array protected again before starting on the filesystem repair. You can use the drive manufacturers diagnostics software to test the new disk on another computer rather than preclearing it, since it is not necessary to have a clear disk for a rebuild.

Do you have backups of any irreplaceable data?

JorgeB · March 28, 2017

I would fix the file system first, then and if don't need it leave it shutdown until the spare is ready.

Also keep the old disk intact until rebuild is done.

rjbaat · March 28, 2017

1 minute ago, trurl said:

If you can shutdown until you get the new disk then you could get your array protected again before starting on the filesystem repair. You can use the drive manufacturers diagnostics software to test the new disk on another computer rather than preclearing it, since it is not necessary to have a clear disk for a rebuild.

Do you have backups of any irreplaceable data?

Yes, I have everything backed up to Crashplan.

1 minute ago, johnnie.black said:

I would fix the file system first, then and if don't need it leave it shutdown until the spare is ready.

Also keep the old disk intact until rebuild is done.

Oke, so it would not harm to do the xfs_repair -v /dev/md1 command now? Would that mean the disk pending sector will be repaired?

JorgeB · March 28, 2017

You'd be running xfs_repair on the emulated disk, not old disk1, I would do it before the rebuild because although xfs_repair should fix it with no issues there would be no point in rebuilding a disk with a corrupt file system that can't be repaired.

rjbaat · March 28, 2017

Aha oke. I understand. I will do this tomorrow. For now i will shut it down and before i put in the new disk tomorrow i will do the xfs_repair. Thanks for the quick replies!

rjbaat · March 29, 2017

19 hours ago, johnnie.black said:

You'd be running xfs_repair on the emulated disk, not old disk1, I would do it before the rebuild because although xfs_repair should fix it with no issues there would be no point in rebuilding a disk with a corrupt file system that can't be repaired.

Oke i tried the command and this is the result:

Phase 1 - find and verify superblock...
        - block cache size set to 1085048 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 228480 tail block 228474
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

What is best to do next?

JorgeB · March 29, 2017

Use -L, usually there's no data loss.

rjbaat · March 29, 2017

9 minutes ago, johnnie.black said:

Use -L, usually there's no data loss.

Oke i have done this. Attached is the log.

Is there something you can see good or bad?

log

JorgeB · March 29, 2017

Looks like there was some corruption, look at lost+found folder, any files there may be corrupt.

If there are corrupt files you should be able to copy most of them from the old disk if needed, but only do that after the rebuild.

rjbaat · March 29, 2017

2 minutes ago, johnnie.black said:

Looks like there was some corruption, look at lost+found folder, any files there may be corrupt.

If there are corrupt files you should be able to copy most of them from the old disk if needed, but only do that after the rebuild.

Oke i have restarted the array looked on Disk1 and i see indeed an lost+found folder. There are alot of files i dont recognise. Could be Crashplan incoming backup data or something. There are some large files aswell.

Now i can swap the disk for the new one right? Or do i need to preclear is first?

JorgeB · March 29, 2017

Preclear is not required to replace an existing disk, some do it to check if the disk is OK, in this case since you'll need to remain unprotected for more time it's up to you.

rjbaat · March 29, 2017

yes indeed. I can do that later. First i will rebuild. Keep the existing disk as it is and when the rebuild is done i can see if the old 2tb disk is to be repaired and maybe put it back in again.

rjbaat · April 3, 2017

I just wanted to start a Preclear on the 2 TB disk with the pending sector.

But after 1 minute in the pre-read it disappeared. I downloaded the diagnostics and attached it. Can you see what happend?

tower-diagnostics-20170403-1817.zip

JorgeB · April 3, 2017

Many errors:

Apr  3 18:16:09 Tower kernel: ata13.00: exception Emask 0x1 SAct 0x1f00 SErr 0x0 action 0x6 frozen
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/38:00:c8:05:81/02:00:00:00:00/40 tag 8 ncq dma 290816 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/08:00:00:08:81/04:00:00:00:00/40 tag 9 ncq dma 528384 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/f8:00:08:0c:81/03:00:00:00:00/40 tag 10 ncq dma 520192 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:00:00:00/00:00:00:00:00/40 tag 11 ncq dma 512 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:40:64:0d/00:00:47:00:00/40 tag 12 ncq dma 512 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13: hard resetting link

And finally drooped offline:

Apr  3 18:16:29 Tower kernel: ata13.00: disabled

Edited April 3, 2017 by johnnie.black

rjbaat · April 3, 2017

21 minutes ago, johnnie.black said:

Many errors:


Apr  3 18:16:09 Tower kernel: ata13.00: exception Emask 0x1 SAct 0x1f00 SErr 0x0 action 0x6 frozen
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/38:00:c8:05:81/02:00:00:00:00/40 tag 8 ncq dma 290816 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/08:00:00:08:81/04:00:00:00:00/40 tag 9 ncq dma 528384 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/f8:00:08:0c:81/03:00:00:00:00/40 tag 10 ncq dma 520192 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:00:00:00/00:00:00:00:00/40 tag 11 ncq dma 512 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED
Apr  3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:40:64:0d/00:00:47:00:00/40 tag 12 ncq dma 512 in
Apr  3 18:16:09 Tower kernel:         res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error)
Apr  3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR }
Apr  3 18:16:09 Tower kernel: ata13.00: error: { ABRT }
Apr  3 18:16:09 Tower kernel: ata13: hard resetting link

And finally drooped offline:


Apr  3 18:16:29 Tower kernel: ata13.00: disabled

Is there a specific reason of the errors? I connected the disk to my Mac and formatted it fine.

I will try again to preclear it in Unraid.

JorgeB · April 3, 2017

Probably the pending sector.

Formatting only read/writes a few sectors on the disk, preclear writes all sectors.

rjbaat · April 3, 2017

Oke thnx! I will skip the pre-read and preclear directly. Hopefully the pending sector will be repaired while writing zeros.

JorgeB · April 3, 2017

57 minutes ago, rjbaat said:

I will skip the pre-read and preclear directly

You should definitely skip it, I didn't consider you were doing that, it's normal for a disk with pending sectors to error out during reads.

rjbaat · April 4, 2017

Oke, well it did complete the preclear and i think it started the post-read. But after it disappeared again i think.

I restarted the server but its saying: status precleared. I then formatted with the unassigned devices plugin and mounted. This did work.

But when checking the SMART status. The pending sector is still there.

tower-diagnostics-20170404-0007.zip

Edit 1:

At this moment i am running an erase and clear cycle to see if that fixes it.

Edit 2:

No it didnt. The cycle aborted and the disk disappeared.

tower-diagnostics-20170404-1502.zip

Edited April 4, 2017 by rjbaat

Pending sector. Device disabled. What to do?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation