Data not present after replacing bad drive

jtech007 · February 25, 2016

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

itimpi · February 25, 2016

Have you tried running a file system check on the rebuilt drive? What file system type is it using? Chances are that the data is all there but there is some other issue that needs resolving? I hope that at no point did you use the format option in unRAID (you did not mention it so I hope not) as that effectively wipes data from the selected drive.

It might be a good idea to post the diagnostics file (Tools->Diagnostics) here so we can get more of an insight into your setup and what is happening.

danioj · February 25, 2016

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

Have you checked that the disk is on the included / excluded list in Global Share Settings and / or Specific Share Settings?

trurl · February 25, 2016

Others have already given good advice, but I just wanted to comment on this bit:

... Of course it's all of my photos that I cannot replace...

You should never have only one copy of irreplaceable files. Make a backup plan.

jtech007 · February 25, 2016

Have you tried running a file system check on the rebuilt drive?

I will research how to do that and give it a try.

What file system type is it using?

XFS, same as the last drive and all of my drives.

Chances are that the data is all there but there is some other issue that needs resolving? I hope that at no point did you use the format option in unRAID (you did not mention it so I hope not) as that effectively wipes data from the selected drive.

I did not format the drive and didn't see an option to do so. Just clicked on the option to start the rebuild

It might be a good idea to post the diagnostics file (Tools->Diagnostics) here so we can get more of an insight into your setup and what is happening.

Thank you for your help, hopefully the diagnostics file will reveal what is happening.

tower-diagnostics-20160225-0803.zip

jtech007 · February 25, 2016

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

Have you checked that the disk is on the included / excluded list in Global Share Settings and / or Specific Share Settings?

Global Share Settings: Included disks=All / Excluded Disks = None

Specific Share settings: Included disks=All / Excluded Disks = None

jtech007 · February 25, 2016

Others have already given good advice, but I just wanted to comment on this bit:

... Of course it's all of my photos that I cannot replace...

You should never have only one copy of irreplaceable files. Make a backup plan.

I was actually working on making that happen when the drive issue came up. I have some of the photos on other drives on a local machine, but not all of them.

JorgeB · February 25, 2016

Diagnostic is missing several SMART reports, including disk 3 HGST_HMS5C4040ALE640_PL2331LAH0B9MJ, but this looks like fs corruption on disk3:

Feb 25 07:59:06 TOWER shfs/user: shfs_readdir: readdir_r: /mnt/disk3/Photos (5) Input/output error
Feb 25 07:59:06 TOWER kernel: XFS (md3): metadata I/O error: block 0x508b5d40 ("xfs_trans_read_buf_map") error 5 numblks 8

Would like to see a SMART report before recommending to check file system.

jtech007 · February 25, 2016

Diagnostic is missing several SMART reports, including disk 3 HGST_HMS5C4040ALE640_PL2331LAH0B9MJ, but this looks like fs corruption on disk3:
Feb 25 07:59:06 TOWER shfs/user: shfs_readdir: readdir_r: /mnt/disk3/Photos (5) Input/output error
Feb 25 07:59:06 TOWER kernel: XFS (md3): metadata I/O error: block 0x508b5d40 ("xfs_trans_read_buf_map") error 5 numblks 8
Would like to see a SMART report before recommending to check file system.

Is that something that would have carried over from the old drive? Or maybe I have a bad cable or controller card that is causing the issue?

I can post a report when I get home tonight.

JorgeB · February 25, 2016

If there were fs issues before the rebuild, they’ll also be present after.

If the new disk is OK checking filesystem should solve the issue.

jtech007 · February 25, 2016

If there were fs issues before the rebuild, they’ll also be present after.

If the new disk is OK checking filesystem should solve the issue.

Good to know, I will get the report later today and hope for the best. Thanks again for everyone's help!

jtech007 · February 26, 2016

If there were fs issues before the rebuild, they’ll also be present after.

If the new disk is OK checking filesystem should solve the issue.

So I have a new problem. Went to pull a smart report for the new drive. Did that but the report was empty and it said the drive did not exist/not found. Upon reboot (to see that would get the drive to show up) three of the five disks in the array are not being found by the motherboard on bootup of unraid. Only the parity drive and one of my 4TB drives is being recognized. This leads me to believe that I have an issue with the M1015 or the Backplane on my Supermicro chassis. I also now wonder if that was the cause of the original drive showing up as bad as it was not being recognized or had a bad connection which caused the errors that the new disk is experiencing. I have another M1015 and also have a lot of SATA ports on the MB so I can remove the card if need be to see if that fixes the problem.

So new questions:

1: Once I figure out the connectivity issues, should I attempt to put the old drive back in the array and see if it will work? I imaagine this might cause an issue with the parity, but wanted to see if it was a possibility.

2: Should I try to put the old drive in my test server and see if it will come up as good and working?

My goal at this point is to see if the old drive was indeed good after all and the issue was cause by other hardware and not a failed drive.

JorgeB · February 26, 2016

I'd try it first on the test server without parity, see if you can access the files and grab a SMART report.

jtech007 · February 27, 2016

unRaid is back up now, all disks are recgonized. I think I might have an issue with my backplane, so for now the drives are connected to the SATA ports on the MB. Now disk 3 (the one with the issue on the old drive and the new one) is reporting Device is disabled, contents emulated. This was the same error that I had with the first drive that I replaced.

Also wanted to note the history of this problem in case there was a step that I missed or was not noted previously:

Original 4TB drive showed as Disabled/Emulated. Thought the drive was going bad so I bought a new 4TB drive, pre-cleared it, removed the old drive and put the new one in it's place. Started up unRaid. It gave me the option to rebuild the disk, I did NOT format the drive, the drive was rebuilt and reported the rebuild went fine with no errors. I then ran a parity check again to make sure nothing was going to show up missing. New drive has been in the server about a week and then I noticed the Photos share did not have any files at all, completely empty. That's when I started this thread, now after a temp fix for connectivity I am back where I started with the new disk saying Disabled/Emulated. I have attached the smart report for the drive with the issue.

What would be the best course of action at this point?

Smart_Report.txt

trurl · February 27, 2016

Post a new diagnostic

jtech007 · February 27, 2016

Post a new diagnostic

Attached

tower-diagnostics-20160226-2036.zip

JorgeB · February 27, 2016

All disks look fine, did you try the old disk on a test server?

Either way you can try to fix disk3:

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

jtech007 · February 27, 2016

All disks look fine, did you try the old disk on a test server?

If I put it in the test server and it comes up with the same error as my new disk3, can I fix that drive as well with the check disk and put it back in the array? I think this would cause an issue with the parity, but just covering all my bases.

JorgeB · February 27, 2016

You can try either one first, results should be similar, but to re-use the test server one on the array you would have to do a new config and recalculate parity.

jtech007 · February 27, 2016

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

jtech007 · February 27, 2016

Here is a copy of the repair session:

TOWER login: root

Linux 4.1.7-unRAID.

root@TOWER:~# xfs_repair -v /dev/md3

Phase 1 - find and verify superblock...

- block cache size set to 3019880 entries

Phase 2 - using internal log

- zero log...

zero_log: head block 162112 tail block 162108

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed. Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair. If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

I mounted the file system and then put it back to maintenance mode

root@TOWER:~# xfs_repair -v /dev/md3

Phase 1 - find and verify superblock...

- block cache size set to 3019880 entries

Phase 2 - using internal log

- zero log...

zero_log: head block 162116 tail block 162116

- scan filesystem freespace and inode maps...

- found root inode chunk

Phase 3 - for each AG...

- scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 2

- agno = 1

- agno = 3

Phase 5 - rebuild AG headers and trees...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- traversal finished ...

- moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

XFS_REPAIR Summary Sat Feb 27 11:32:24 2016

Phase Start End Duration

Phase 1: 02/27 11:31:57 02/27 11:31:58 1 second

Phase 2: 02/27 11:31:58 02/27 11:32:16 18 seconds

Phase 3: 02/27 11:32:16 02/27 11:32:23 7 seconds

Phase 4: 02/27 11:32:23 02/27 11:32:23

Phase 5: 02/27 11:32:23 02/27 11:32:23

Phase 6: 02/27 11:32:23 02/27 11:32:23

Phase 7: 02/27 11:32:23 02/27 11:32:23

Total run time: 26 seconds

done

root@TOWER:~#

Drive still shows red at this point. Not sure where to go from here. I ran the repair which is disk #3 in unRaid, would the file system see it as a different number as I have a parity drive? I am not sure how it orders them.

JorgeB · February 27, 2016

Repairing the filesystem wont enable a disabled (emulated) disk, you still have to rebuild it.

Can you access all the files on disk3 after repairing running xfs_repair?

If yes and you think you sorted out your connectivity issues you can rebuild on the same disk, for that:

stop array

unassign disk3 (select "no device")

start array

stop array

re-assign disk3

start array to begin rebuild

JorgeB · February 27, 2016

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

Also, can you access the problem folder on the old disk?

jtech007 · February 27, 2016

Repairing the filesystem wont enable a disabled (emulated) disk, you still have to rebuild it.

Can you access all the files on disk3 after repairing running xfs_repair?

If yes and you think you sorted out your connectivity issues you can rebuild on the same disk, for that:

stop array

unassign disk3 (select "no device")

start array

stop array

re-assign disk3

start array to begin rebuild

Forgot about that part!

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

Also, can you access the problem folder on the old disk?

Yes, the files are there now on the Photo share on the new disk and can also be seen on the old disk. I would assume that all of this is due to connectivity causing errors so I will need to sort that out soon.

JorgeB · February 27, 2016

Yes, probably both the red balls and fs corruption were caused by that.

Data not present after replacing bad drive

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation