[SOLVED] Cache drives offline after forced shutdown


Recommended Posts

Hey Guys,

 

A few days ago, unRAID knocked a HDD offline; its behavior indicated that it might be a physical failure. So I pulled the drive and wait for a warranty replacement. Meanwhile, something hiccuped with a cache SSD; which I'm hoping was just a bad data cable.

 

 

While waiting for the replacement HDD, another one went offline; I suspect there is nothing wrong with it so I moved it to empty slot which is on a different HBA & power/data cable and got it rebuilt. I then attempt a shut down to replace the data cable on the SSD... and here's where another issue arose: unRAID locked up on the shutdown; which I suspect had something to do with the cache SSD.

 

After waiting 15 minuets or so, I manually turned off the system, replaced the SSD data cable in question and restarted unRAID. It then reported on the main page "Unclean shutdown detected" &  "Start will bring the array on-line and start Parity-Check." I didn't want to rebuild parity with a missing drive, so I waited until my warranty replacement arrived later that afternoon.

 

Once the warranty HDD arrived and was installed, unRAID rebuilt the contents of the missing HDD and now, everything seems to be ok  ---Except---  the cache SSDs are detected, but will not mount. There is an option to format the cache pool, but I don't want to do that, because there is appdata and other data on the disk that I don't want to lose.

 

Attached are the diags....can anyone assist in helping me recover the data?

 

Thanks,

 

~j

 

 

tower-diagnostics-20170803-1054.zip

Edited by Joseph
Link to comment
5 hours ago, johnnie.black said:

UUID is missing, you can try to change it but I would prefer doing after trying the recovering options as I never tried it, see if options #1 or #2 work to recover data:

 

UPDATE:

 

Tried Step 1; Results:

        CLI Msg: "mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error"

 

Tried Step 2; Results:

        After numerous "Trying another mirror" messages, there is content in the restore directory.

 

So I have a few of questions before moving forward:

  • Any thoughts how to get Step 1 to work?
  • How reliable is the data integrity? IOW, Is it possible to have corrupted files in the recovery directory?
  • Would you suggest performing Step 3 (or any other sort of data recovery) just in case there is some data corruption?
  • Is there any special instruction I need to do prior to bringing the cache pool back online?

Thanks!!

 

Link to comment
10 minutes ago, Joseph said:

How reliable is the data integrity? IOW, Is it possible to have corrupted files in the recovery directory?

 

If option 1 doesn't work with those options I don't know how to get it working, hence option 2, AFAIK btrfs restore won't restore files that fail checksum, and will output a warning if any checksum errors are found (this part I'm sure), so all restored data should be fine and it is fine if there weren't any checksum errors, check a few random files to confirm.

 

After recovered data is checked best way forward is to reformat the pool and restore data.

Link to comment
5 hours ago, johnnie.black said:

After recovered data is checked best way forward is to reformat the pool and restore data.

 

Thanks johnnie.black,

 

In other news, I see this error again (perhaps it never went away)

Aug  2 15:07:40 Tower kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen

:S:S:S:S:S:S

 

I'm going to shut everything down and double check cables--AGAIN!! before bringing the cache pool back online.

 

*HEAD-DESK!*

Link to comment

UPDATE:

 

I think I have everything back in order. Its unconfirmed, but I believe all errors--on the protected array and cache--were a result of one or both bad data cables on the cache drive. I've attached a photo so others can see that just the slightest bit of warping on the connector may have been the culprit... which I suspect this was caused by heat over time. I've replaced both cables with locking ones. That said, my thoughts are they too are not immune to warping.

 

Additionally, I've installed and setup the CA Appdata Backup/Restore plugin by Andrew Zawadzk to protect the appdata on the cache drive, the VM XML files and the USB flash data.

Thank you again for all your help!!

File Aug 03, 6 04 02 PM.jpeg

Edited by Joseph
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.