Potential Large Data Loss!


Recommended Posts

Hello

I went to copy some files to my Unraid server today and noticed many subfolders missing that contained multiple terabytes of files! I have no idea what could have happened. I'm using Unraid Server Plus version 6.3.5.

 

I have been having a problem for quite a while now where my server becomes unresponsive and I need to do a hard shutdown. Today it was unresponsive and before I did a hard shutdown, I tried stopping the array using the web GUI. It got stuck saying "retrying to unmount shares" and I eventually did a hard shut down. I googled that issue and one forum suggested unplugging the usb stick with the OS on it, plugging it into a Windows computer, and letting it check for errors. I did this and sure enough, the windows computer said there were errors so I let it scan and fix them.

 

After starting up the server again I noticed all of the missing files. One particular subfolder used to have pages of further subfolders but now it has less than one page and the web GUI says that the total size of the files on the server is several terabytes less than it was before. I believe that the files went missing when I started the server after fixing errors on the usb stick but I am not 100% sure.

 

I have a few questions that hopefully someone more experienced can help me with (thanks in advance if that's you!):

1) Is it safe to run a parity check? Would such a check replace missing files or simply update the parity so that it matches the new state of the array without those files?

 

2) Is there a log or something that would give a clue as to what happened?

 

3) Is there any way to check if there are recoverable files still hidden on the array (the disk(s) with the missing files are reiserfs).

 

4) Could this be a symptom of one of my drives starting to fail (the missing subfolders seem random and most likely were on a different drive than the ones that remain)? Is there a way to test the drives to see if they are failing?

 

I would appreciate any guidance or suggestions that you can offer. Please note that I am familiar with monitoring my server using the weg GUI but to do anything else I would need step by step instructions (I know almost nothing about Linus). Thanks!

Link to comment

I recently added disks 6 through 10 and they only have about 50GB data on each (and it doesn't appear there is any data loss from them).

The more I think about it, It seems like one or more of the disks (1-5) are missing all of the data from one subfolder. This would explain why the data missing from the subfolder seems random (I have unraid set to span data across multiple drives down to two folder levels, beyond that all data is keep on the same drive).

 

I don't know what reisfsck is or how I run it. Is there a plugin or something?

Link to comment

It's possible that the folder may only be missing from one disk-it's hard to say. I don't use docker (in fact I didn't start having issues with my unraid server until I tried using docker and then uninstalled it) and I consider it highly unlikely that there was an accidental deletion by a user.

 

In any case, I will try running the check on the drives in question. Is there such a thing as a file recovery program that can check for deleted files and potentially restore them for the array? Just in case something happened that caused the folder to be deleted from one of the disks.

Link to comment
22 minutes ago, johnnie.black said:

backup the disk first if you decide to use it.

This!

In fact, since you have all those mostly empty XFS disks, I would copy what is currently on your ReiserFS disks over to the XFS disks, then you can try the drastic recovery options without jeopardizing the data that is currently intact.

Link to comment

Just finished running the reiserfsck on the drives (just the standard check). Drives 1-4 had "no corruptions found." Drive 5 returned the following:

reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md5
Will put log info to 'stdout'
###########
reiserfsck --check started at Mon Aug 14 13:50:42 2017
###########
Replaying journal: 
Replaying journal: Done.
Reiserfs journal '/dev/md5' in blocks [18..8211]: 0 transactions replayed
Checking internal tree..  finished
Comparing bitmaps..Checking Semantic tree:
finished
3 found corruptions can be fixed when running with --fix-fixable
###########
reiserfsck finished at Mon Aug 14 15:35:14 2017
###########
vpf-10640: The on-disk and the correct bitmaps differs.
vpf-10680: The file [8263 8733] has the wrong block count in the StatData (32776), should be (32768)

 

Strangely, that is the only drive that still contains files in the folder with the missing subfolders. I presume I should run the check again with the "--fix-fixable" parameter as instructed?

 

That doesn't really sound like it will fix my problem though. If I continue on with the attempted file recovery as suggested above, do I simply copy everything from each of the drives with potentially missing data to the nearly empty drives, run the checks on the drives (with --rebuild-tree --scan-whole-partition parameters), then ensure nothing is lost before deleting the back ups on the previously empty drives? This won't screw up my array (since I am copying files between specific disks in the array manually rather than just copying files to the array and allowing the server to allocate them as I have always done before).

 

Thanks again for the help!

Link to comment

You mention that you sometimes try to stop the array and had trouble shutting down. That usual means that some file is open, an ssh season is open with current directory in the array, or something else is holding a resource in the array open. You can go to a command line and enter the command powerdown. (Not sure if that's part of a plugin or just installed by default, but @WeeboTech invented this wonderful tool to kill this and that process to allow the array to shutdown cleanly, and then reboot the server.)

 

I absolutely do not recommend the hard shutdown method! Once in a blue moon, you have to. But do your level best to avoid it! 

 

Link to comment
20 minutes ago, bleejean said:

I presume I should run the check again with the "--fix-fixable" parameter as instructed?

 

Yes

 

20 minutes ago, bleejean said:

That doesn't really sound like it will fix my problem though. If I continue on with the attempted file recovery as suggested above, do I simply copy everything from each of the drives with potentially missing data to the nearly empty drives, run the checks on the drives (with --rebuild-tree --scan-whole-partition parameters), then ensure nothing is lost before deleting the back ups on the previously empty drives? This won't screw up my array (since I am copying files between specific disks in the array manually rather than just copying files to the array and allowing the server to allocate them as I have always done before).

 

Sounds good, just make sure you copy the data from disk to disk, don't use user shares.

Link to comment
28 minutes ago, johnnie.black said:

Sounds good, just make sure you copy the data from disk to disk, don't use user shares.

@bleejean, I would go a step further, I recommend turning OFF user shares for the duration of this exercise, and turning on disk shares. That way you know exactly what files are where, and can audit them more easily. Just keep in mind ANY folder in the root of the disk share will be exported as a user share when you turn them back on, so any temporary folders you create for recovery and auditing purposes need to be cleaned up or moved into root folders that you want to keep as user shares.

Link to comment

Thanks for all the help everyone! I have a happy result to report.

 

I was just starting large file copies to back up my affected drives onto my newly added drives. Lucky for me I was paying attention and noticed that some of the first files that were being copied were of the type that were missing. I did some poking around and discovered that somehow the folder with the missing files had been moved inside another folder (a neighboring folder that it must have accidentally been dragged into)! Strangely this only happened on 4 of the 5 discs (which is why I didn't think to check for this type of mistake before).

 

Anyways, I cancelled the file copy and moved the folders back to their proper places. Apparently I also misremembered the amount of used space on the server since it appears everything is accounted for.

 

Sorry for wasting your time on a dumb mistake but in any case, I appreciate the help in troubleshooting my issue. Thanks again!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.