Data not present after replacing bad drive


jtech007

Recommended Posts

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

 

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

 

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

Link to comment

Have you tried running a file system check on the rebuilt drive?    What file system type is it using?  Chances are that the data is all there but there is some other issue that needs resolving?  I hope that at no point did you use the format option in unRAID (you did not mention it so I hope not) as that effectively wipes data from the selected drive.

 

It might be a good idea to post the diagnostics file (Tools->Diagnostics) here so we can get more of an insight into your setup and what is happening.

Link to comment

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

 

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

 

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

 

Have you checked that the disk is on the included / excluded list in Global Share Settings and / or Specific Share Settings?

Link to comment

Have you tried running a file system check on the rebuilt drive?

 

I will research how to do that and give it a try.

 

What file system type is it using?

 

XFS, same as the last drive and all of my drives.

 

Chances are that the data is all there but there is some other issue that needs resolving?  I hope that at no point did you use the format option in unRAID (you did not mention it so I hope not) as that effectively wipes data from the selected drive.

 

I did not format the drive and didn't see an option to do so. Just clicked on the option to start the rebuild

 

It might be a good idea to post the diagnostics file (Tools->Diagnostics) here so we can get more of an insight into your setup and what is happening.

 

Thank you for your help, hopefully the diagnostics file will reveal what is happening.

tower-diagnostics-20160225-0803.zip

Link to comment

Had a bad 4TB drive that I replaced with the same model 4TB. Pulled the old drive, set it aside, pre-cleared the new drive, installed it and started up unRaid.

 

It detected the new drive as the one to replace the old one and gave the option to start the array and re-build the drive. Did that, all seemed to go well but I have a whole share that has no files in it. Of course it's all of my photos that I cannot replace. The total TB used and unused is exactly the same as it was before I replace the drive.

 

Is there anything I can do with the old drive to get the data off of it or what did I do wrong with the new drive during the re-build to cause it to not re-build the drive fully?

 

Have you checked that the disk is on the included / excluded list in Global Share Settings and / or Specific Share Settings?

 

Global Share Settings: Included disks=All / Excluded Disks = None

 

Specific Share settings: Included disks=All / Excluded Disks = None

Link to comment

Others have already given good advice, but I just wanted to comment on this bit:

... Of course it's all of my photos that I cannot replace...

You should never have only one copy of irreplaceable files. Make a backup plan.

 

I was actually working on making that happen when the drive issue came up. I have some of the photos on other drives on a local machine, but not all of them.

Link to comment

Diagnostic is missing several SMART reports, including disk 3 HGST_HMS5C4040ALE640_PL2331LAH0B9MJ, but this looks like fs corruption on disk3:

 

Feb 25 07:59:06 TOWER shfs/user: shfs_readdir: readdir_r: /mnt/disk3/Photos (5) Input/output error
Feb 25 07:59:06 TOWER kernel: XFS (md3): metadata I/O error: block 0x508b5d40 ("xfs_trans_read_buf_map") error 5 numblks 8

 

Would like to see a SMART report before recommending to check file system.

 

 

Link to comment

Diagnostic is missing several SMART reports, including disk 3 HGST_HMS5C4040ALE640_PL2331LAH0B9MJ, but this looks like fs corruption on disk3:

 

Feb 25 07:59:06 TOWER shfs/user: shfs_readdir: readdir_r: /mnt/disk3/Photos (5) Input/output error
Feb 25 07:59:06 TOWER kernel: XFS (md3): metadata I/O error: block 0x508b5d40 ("xfs_trans_read_buf_map") error 5 numblks 8

 

Would like to see a SMART report before recommending to check file system.

 

Is that something that would have carried over from the old drive? Or maybe I have a bad cable or controller card that is causing the issue?

 

I can post a report when I get home tonight.

Link to comment

If there were fs issues before the rebuild, they’ll also be present after.

 

If the new disk is OK checking filesystem should solve the issue.

 

 

So I have a new problem. Went to pull a smart report for the new drive. Did that but the report was empty and it said the drive did not exist/not found. Upon reboot (to see that would get the drive to show up) three of the five disks in the array are not being found by the motherboard on bootup of unraid. Only the parity drive and one of my 4TB drives is being recognized. This leads me to believe that I have an issue with the M1015 or the Backplane on my Supermicro chassis. I also now wonder if that was the cause of the original drive showing up as bad as it was not being recognized or had a bad connection which caused the errors that the new disk is experiencing. I have another M1015 and also have a lot of SATA ports on the MB so I can remove the card if need be to see if that fixes the problem.

 

So new questions:

 

1: Once I figure out the connectivity issues, should I attempt to put the old drive back in the array and see if it will work? I imaagine this might cause an issue with the parity, but wanted to see if it was a possibility.

 

2: Should I try to put the old drive in my test server and see if it will come up as good and working?

 

My goal at this point is to see if the old drive was indeed good after all and the issue was cause by other hardware and not a failed drive.

Link to comment

unRaid is back up now, all disks are recgonized. I think I might have an issue with my backplane, so for now the drives are connected to the SATA ports on the MB. Now disk 3 (the one with the issue on the old drive and the new one) is reporting Device is disabled, contents emulated. This was the same error that I had with the first drive that I replaced.

 

Also wanted to note the history of this problem in case there was a step that I missed or was not noted previously:

 

Original 4TB drive showed as Disabled/Emulated. Thought the drive was going bad so I bought a new 4TB drive, pre-cleared it, removed the old drive and put the new one in it's place. Started up unRaid. It gave me the option to rebuild the disk, I did NOT format the drive, the drive was rebuilt and reported the rebuild went fine with no errors. I then ran a parity check again to make sure nothing was going to show up missing. New drive has been in the server about a week and then I noticed the Photos share did not have any files at all, completely empty. That's when I started this thread, now after a temp fix for connectivity I am back where I started with the new disk saying Disabled/Emulated. I have attached the smart report for the drive with the issue.

 

What would be the best course of action at this point?

Smart_Report.txt

Link to comment

All disks look fine, did you try the old disk on a test server?

 

If I put it in the test server and it comes up with the same error as my new disk3, can I fix that drive as well with the check disk and put it back in the array? I think this would cause an issue with the parity, but just covering all my bases.

Link to comment

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

Link to comment

Here is a copy of the repair session:

 

 

TOWER login: root

Linux 4.1.7-unRAID.

root@TOWER:~# xfs_repair -v /dev/md3

Phase 1 - find and verify superblock...

        - block cache size set to 3019880 entries

Phase 2 - using internal log

        - zero log...

zero_log: head block 162112 tail block 162108

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair.  If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

 

I mounted the file system and then put it back to maintenance mode

 

root@TOWER:~# xfs_repair -v /dev/md3                                           

Phase 1 - find and verify superblock...

        - block cache size set to 3019880 entries

Phase 2 - using internal log

        - zero log...

zero_log: head block 162116 tail block 162116

        - scan filesystem freespace and inode maps...

        - found root inode chunk

Phase 3 - for each AG...

        - scan and clear agi unlinked lists...

        - process known inodes and perform inode discovery...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - process newly discovered inodes...

Phase 4 - check for duplicate blocks...

        - setting up duplicate extent list...

        - check for inodes claiming duplicate blocks...

        - agno = 0

        - agno = 2

        - agno = 1

        - agno = 3

Phase 5 - rebuild AG headers and trees...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - reset superblock...

Phase 6 - check inode connectivity...

        - resetting contents of realtime bitmap and summary inodes

        - traversing filesystem ...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - traversal finished ...

        - moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

 

        XFS_REPAIR Summary    Sat Feb 27 11:32:24 2016

 

Phase          Start          End            Duration

Phase 1:        02/27 11:31:57  02/27 11:31:58  1 second

Phase 2:        02/27 11:31:58  02/27 11:32:16  18 seconds

Phase 3:        02/27 11:32:16  02/27 11:32:23  7 seconds

Phase 4:        02/27 11:32:23  02/27 11:32:23

Phase 5:        02/27 11:32:23  02/27 11:32:23

Phase 6:        02/27 11:32:23  02/27 11:32:23

Phase 7:        02/27 11:32:23  02/27 11:32:23

 

Total run time: 26 seconds

done

root@TOWER:~#

 

Drive still shows red at this point. Not sure where to go from here. I ran the repair which is disk #3 in unRaid, would the file system see it as a different number as I have a parity drive? I am not sure how it orders them.

 

Link to comment

Repairing the filesystem wont enable a disabled (emulated) disk, you still have to rebuild it.

 

Can you access all the files on disk3 after repairing running xfs_repair?

 

If yes and you think you sorted out your connectivity issues you can rebuild on the same disk, for that:

 

stop array

unassign disk3 (select "no device")

start array

stop array

re-assign disk3

start array to begin rebuild

Link to comment

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

 

Also, can you access the problem folder on the old disk?

Link to comment

Repairing the filesystem wont enable a disabled (emulated) disk, you still have to rebuild it.

 

Can you access all the files on disk3 after repairing running xfs_repair?

 

If yes and you think you sorted out your connectivity issues you can rebuild on the same disk, for that:

 

stop array

unassign disk3 (select "no device")

start array

stop array

re-assign disk3

start array to begin rebuild

 

Forgot about that part!

 

I put the old drive in my test server, alone, with no other drives. Assigned it as a new drive for Disk 1 and started in maintenance mode first to see what it would say. Drive shows green with normal operation. Mounted the disk and still have green/normal and can see the drive as it was before the errors occurred. I now believe that what ever connectivity issue I have (still haven't 100% sorted it out) caused the old disk to red ball and the new one as well. I guess I will check the new disk for errors and see if the data shows up after repairs are made. Thank you again for your help, hopefully this will fix the issue!

 

Also, can you access the problem folder on the old disk?

 

Yes, the files are there now on the Photo share on the new disk and can also be seen on the old disk. I would assume that all of this is due to connectivity causing errors so I will need to sort that out soon.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.