rjbaat Posted March 28, 2017 Share Posted March 28, 2017 Hi All, I have a problem with a 2TB WD red disk. It is unmountable suddenly. This is never happend to me before, so i hope i can get some advice on what to do next. Without loosing data. The smart result says there is a pending error. The disk is disabled in the array. What can i do to get rid of the pending sector? tower-diagnostics-20170328-2206.zip Quote Link to comment
trurl Posted March 28, 2017 Share Posted March 28, 2017 Looks like you rebooted before taking the diagnostic so we can't see why disk1 was disabled. It looks like you also have filesystem corruption on disk1, possibly related. Do you have a tested spare you can rebuild to? Quote Link to comment
JorgeB Posted March 28, 2017 Share Posted March 28, 2017 You can start the checking the file system on the emulated disk1 (md1): https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS Do you have a spare for the rebuild? Quote Link to comment
rjbaat Posted March 28, 2017 Author Share Posted March 28, 2017 yes i rebooted to check if it would come backup again :(. But it didnt help. I just ordered a new 3TB WD red that is delivered tomorrow. So then i can rebuild to that disk. Should i preclear the new one first? Quote Link to comment
rjbaat Posted March 28, 2017 Author Share Posted March 28, 2017 2 minutes ago, johnnie.black said: You can start the checking the file system on the emulated disk1 (md1): https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS Do you have a spare for the rebuild? What would be prefered? To rebuild to a new disk tomorrow and shutdown the server for now? I don't want to mess with it more then needed at this point. Quote Link to comment
trurl Posted March 28, 2017 Share Posted March 28, 2017 Just now, rjbaat said: What would be prefered? To rebuild to a new disk tomorrow and shutdown the server for now? I don't want to mess with it more then needed at this point. If you can shutdown until you get the new disk then you could get your array protected again before starting on the filesystem repair. You can use the drive manufacturers diagnostics software to test the new disk on another computer rather than preclearing it, since it is not necessary to have a clear disk for a rebuild. Do you have backups of any irreplaceable data? Quote Link to comment
JorgeB Posted March 28, 2017 Share Posted March 28, 2017 I would fix the file system first, then and if don't need it leave it shutdown until the spare is ready. Also keep the old disk intact until rebuild is done. Quote Link to comment
rjbaat Posted March 28, 2017 Author Share Posted March 28, 2017 1 minute ago, trurl said: If you can shutdown until you get the new disk then you could get your array protected again before starting on the filesystem repair. You can use the drive manufacturers diagnostics software to test the new disk on another computer rather than preclearing it, since it is not necessary to have a clear disk for a rebuild. Do you have backups of any irreplaceable data? Yes, I have everything backed up to Crashplan. 1 minute ago, johnnie.black said: I would fix the file system first, then and if don't need it leave it shutdown until the spare is ready. Also keep the old disk intact until rebuild is done. Oke, so it would not harm to do the xfs_repair -v /dev/md1 command now? Would that mean the disk pending sector will be repaired? Quote Link to comment
JorgeB Posted March 28, 2017 Share Posted March 28, 2017 You'd be running xfs_repair on the emulated disk, not old disk1, I would do it before the rebuild because although xfs_repair should fix it with no issues there would be no point in rebuilding a disk with a corrupt file system that can't be repaired. Quote Link to comment
rjbaat Posted March 28, 2017 Author Share Posted March 28, 2017 Aha oke. I understand. I will do this tomorrow. For now i will shut it down and before i put in the new disk tomorrow i will do the xfs_repair. Thanks for the quick replies! Quote Link to comment
rjbaat Posted March 29, 2017 Author Share Posted March 29, 2017 19 hours ago, johnnie.black said: You'd be running xfs_repair on the emulated disk, not old disk1, I would do it before the rebuild because although xfs_repair should fix it with no issues there would be no point in rebuilding a disk with a corrupt file system that can't be repaired. Oke i tried the command and this is the result: Phase 1 - find and verify superblock... - block cache size set to 1085048 entries Phase 2 - using internal log - zero log... zero_log: head block 228480 tail block 228474 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. What is best to do next? Quote Link to comment
JorgeB Posted March 29, 2017 Share Posted March 29, 2017 Use -L, usually there's no data loss. Quote Link to comment
rjbaat Posted March 29, 2017 Author Share Posted March 29, 2017 9 minutes ago, johnnie.black said: Use -L, usually there's no data loss. Oke i have done this. Attached is the log. Is there something you can see good or bad? log Quote Link to comment
JorgeB Posted March 29, 2017 Share Posted March 29, 2017 Looks like there was some corruption, look at lost+found folder, any files there may be corrupt. If there are corrupt files you should be able to copy most of them from the old disk if needed, but only do that after the rebuild. Quote Link to comment
rjbaat Posted March 29, 2017 Author Share Posted March 29, 2017 2 minutes ago, johnnie.black said: Looks like there was some corruption, look at lost+found folder, any files there may be corrupt. If there are corrupt files you should be able to copy most of them from the old disk if needed, but only do that after the rebuild. Oke i have restarted the array looked on Disk1 and i see indeed an lost+found folder. There are alot of files i dont recognise. Could be Crashplan incoming backup data or something. There are some large files aswell. Now i can swap the disk for the new one right? Or do i need to preclear is first? Quote Link to comment
JorgeB Posted March 29, 2017 Share Posted March 29, 2017 Preclear is not required to replace an existing disk, some do it to check if the disk is OK, in this case since you'll need to remain unprotected for more time it's up to you. Quote Link to comment
rjbaat Posted March 29, 2017 Author Share Posted March 29, 2017 yes indeed. I can do that later. First i will rebuild. Keep the existing disk as it is and when the rebuild is done i can see if the old 2tb disk is to be repaired and maybe put it back in again. Quote Link to comment
rjbaat Posted April 3, 2017 Author Share Posted April 3, 2017 I just wanted to start a Preclear on the 2 TB disk with the pending sector. But after 1 minute in the pre-read it disappeared. I downloaded the diagnostics and attached it. Can you see what happend? tower-diagnostics-20170403-1817.zip Quote Link to comment
JorgeB Posted April 3, 2017 Share Posted April 3, 2017 (edited) Many errors: Apr 3 18:16:09 Tower kernel: ata13.00: exception Emask 0x1 SAct 0x1f00 SErr 0x0 action 0x6 frozen Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/38:00:c8:05:81/02:00:00:00:00/40 tag 8 ncq dma 290816 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/08:00:00:08:81/04:00:00:00:00/40 tag 9 ncq dma 528384 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/f8:00:08:0c:81/03:00:00:00:00/40 tag 10 ncq dma 520192 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:00:00:00/00:00:00:00:00/40 tag 11 ncq dma 512 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:40:64:0d/00:00:47:00:00/40 tag 12 ncq dma 512 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13: hard resetting link And finally drooped offline: Apr 3 18:16:29 Tower kernel: ata13.00: disabled Edited April 3, 2017 by johnnie.black Quote Link to comment
rjbaat Posted April 3, 2017 Author Share Posted April 3, 2017 21 minutes ago, johnnie.black said: Many errors: Apr 3 18:16:09 Tower kernel: ata13.00: exception Emask 0x1 SAct 0x1f00 SErr 0x0 action 0x6 frozen Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/38:00:c8:05:81/02:00:00:00:00/40 tag 8 ncq dma 290816 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/08:00:00:08:81/04:00:00:00:00/40 tag 9 ncq dma 528384 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/f8:00:08:0c:81/03:00:00:00:00/40 tag 10 ncq dma 520192 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x5 (timeout) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:00:00:00/00:00:00:00:00/40 tag 11 ncq dma 512 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13.00: failed command: READ FPDMA QUEUED Apr 3 18:16:09 Tower kernel: ata13.00: cmd 60/01:00:40:64:0d/00:00:47:00:00/40 tag 12 ncq dma 512 in Apr 3 18:16:09 Tower kernel: res 41/04:17:e0:06:81/00:01:00:00:00/40 Emask 0x1 (device error) Apr 3 18:16:09 Tower kernel: ata13.00: status: { DRDY ERR } Apr 3 18:16:09 Tower kernel: ata13.00: error: { ABRT } Apr 3 18:16:09 Tower kernel: ata13: hard resetting link And finally drooped offline: Apr 3 18:16:29 Tower kernel: ata13.00: disabled Is there a specific reason of the errors? I connected the disk to my Mac and formatted it fine. I will try again to preclear it in Unraid. Quote Link to comment
JorgeB Posted April 3, 2017 Share Posted April 3, 2017 Probably the pending sector. Formatting only read/writes a few sectors on the disk, preclear writes all sectors. Quote Link to comment
rjbaat Posted April 3, 2017 Author Share Posted April 3, 2017 Oke thnx! I will skip the pre-read and preclear directly. Hopefully the pending sector will be repaired while writing zeros. Quote Link to comment
JorgeB Posted April 3, 2017 Share Posted April 3, 2017 57 minutes ago, rjbaat said: I will skip the pre-read and preclear directly You should definitely skip it, I didn't consider you were doing that, it's normal for a disk with pending sectors to error out during reads. Quote Link to comment
rjbaat Posted April 4, 2017 Author Share Posted April 4, 2017 (edited) Oke, well it did complete the preclear and i think it started the post-read. But after it disappeared again i think. I restarted the server but its saying: status precleared. I then formatted with the unassigned devices plugin and mounted. This did work. But when checking the SMART status. The pending sector is still there. tower-diagnostics-20170404-0007.zip Edit 1: At this moment i am running an erase and clear cycle to see if that fixes it. Edit 2: No it didnt. The cycle aborted and the disk disappeared. tower-diagnostics-20170404-1502.zip Edited April 4, 2017 by rjbaat Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.