(SOLVED) Replacing 2 failed disk + upgrade parity (swap)


Recommended Posts

Hi,  2 days ago, I wanted to replace my smallest array disk (2TB) by a new 5TB disk.  I wanted to first pre-clear it.  So I connected it on the server (live) with "iStarUSA 5in3 drive cage" (suppose to work for Live Hot Swap disk...).    When I did that, 2 of my other drives (WD Red 3TB of 4.5 years old) went offline and can't be detected anymore!   At first, I thought that it might be the drive cage, so I eject them and connected them directly on the Sata + power cable (NOT LIVE, server was powered off).  They never came back online, so I suppose that they are in fact dead (and not under warranty).   I have 2 parity drives, so I was able to restart the array (unprotected).   My plan has changed from the initial plan of replacing 2TB -> 5TB to replace one of the dead 3TB -> 5TB and remove the other one from the array.

 

So this is what I did yet :

 

  1. Completed the pre-clear of the new 5TB; everything is good, drive is ready.
  2. Copy the data out of 1 of the 2 failed drive to all other drives in the array (data was still usable from the "virtual" disk it creates when you have a failed drive)

 

My Next step would be this (PLEASE SOMEONE THAT KNOWS HOW TO DEAL WITH THIS CONFIRMS IT'S OK) :

 

  1. Stop the array.
  2. Shutdown the server.
  3. Try a old hard disk in one of the 2 bay of the drive cage where the failed drive was (just want to be sure the cage won't kill a new drive!)
  4. Repeat to test the 2nd bay of the 2nd drive cage.
  5. If everything is fine, shutdown the server.
  6. install the new 5TB in the drive cage.
  7. Power Up the server.

 

Now, how to achieve this ?  (Course of actions needed).  I think I need to do the replace of the failed drive BEFORE removing the other drive ? I don't want to lose the data of the 2nd failed drive before replacing it with the 5TB drive...  :

 

  1. Rebuild the drive that still has data with the 5TB.
  2. Remove the other drive from the array (and not replace it).

 

Edited by Pducharme
Link to comment

Are you sure you don't want the data on disk6? With dual parity, it should be possible to rebuild it also.

 

And it wasn't really necessary to copy files from the emulated disks to other disks in the array. You could have just rebuilt to save that data. Writing to other drives in the array while it's unprotected, or just reading other drives in the unprotected array to rebuild a disk. Which is more risky? I'm not sure.

 

In the end though, you rebuilt anyway. so you have done both. I would have considered copying the data to drives not in the array (or to another system completely) instead of writing to drives in the unprotected array.

 

 

Link to comment
2 hours ago, johnnie.black said:

 

And technically the array would be protected after one of the failed drives is rebuilt, so If you plan to replace the other drive in the near future you could leave it like that for a while.

 

Question :  I can get a 6TB drive, but I have dual parity drive (2 x 5TB).  Can I use the 6TB as one of new Parity than use the older 5TB as data drive ??  OR I need to upgrade both Parity drive to 6TB  ??

Link to comment
8 hours ago, johnnie.black said:

 

You can, and you can use the parity swap procedure (with parity1 only), so the array will remain protected.

 

Ok.  If my dead empty drive is already empty.   I can stop my array, then remove the drive, then start the array, and they do the parity swap ?  OR, i should do the parity swap first, then when done, take the 5TB from the parity to replace the empty bad drive ?  What would be the fastest ?

Link to comment
On 2017-11-11 at 11:22 AM, johnnie.black said:

Do the parity swap procedure, there are two steps, first parity is copied to new parity disk, than the disable disk rebuilt using the old parity disk, instructions here:

 

https://wiki.lime-technology.com/The_parity_swap_procedure

 

 

 

Ok, i followed the process.   Now, I’m at a Copying 100% since a while.... (1 hour on the 100%), hoving over the 100% say that it is not running.  I have a “cancel” button.

 

the disk do not anymore display a blue icon and I haven’t regain control to be able to start the rebuild.  

 

Also, the array is displayed as « off-line » at the top , but the status bar mention « array started » in green and drive (the old parity that is now in the bad disk location) still says that the content is emulated...

 

so, do I have to Click on « cancel » then start the array to star the rebuild even if it still say it is copying?

 

the new parity 1 is a 5TB and old was a 5TB, maybe the 100% complete was the first 5TB and now doing something with the remaining 1TB?

 

please advise what I should do now?

 

Edited by Pducharme
Link to comment
3 hours ago, johnnie.black said:

After the copy is done it will zero the rest of the new parity disk, the array will stop when it's done.

 

How long it could take ?  If the whole copy took 13 hours for a 5TB,  Should I calculate 1/5 of 13 hours for the 1TB of free space? 

 

I checked and it stil show "Copying, 100% completed" since 12h15pm (about 5 hours ago).  Since there is no progress on this part, is there a way to check how long it will take ?  I suspect it shouldn't be that long to just zeroed 1TB...

 

I tried the "Cancel" and it did nothing.  At this point, I don't know what I can do!  I need the array online before 7pm when it will be needed!

 

@johnnie.black, any instructions for me ? You seem like the pro of those kind of processes...

 

I added pictures of what I can see.

Copy is siting at 100% Completed since 4h of time.png

State of the drive and array is Off-Line.png

Edited by Pducharme
Link to comment
1 hour ago, johnnie.black said:

I found it strange that the previous attempt was so long stuck at 100% so did some testing and there is a bug on the newer rcs with the parity swap procedure if using a cleared disk as the new parity, so it will get stuck again, I recommend you cancel the procedure, downgrade to v6.3.5 and start over.

 

Ok, I tried to downgrade to 6.3.5.  Now, the server doesn't boot at all !!

 

Here is the screenshot :

 

 

Capture d’écran 2017-11-17 à 21.35.39.png

Link to comment

Finally, I had to remove the EFI folder from the flash drive, so it is now back on 6.3.5.   I started for the 3rd time the parity swap procedure.  I HOPE to get up tomorrow and be greated by the Stopped Array ready to start the rebuild of the failed drive on the ex-parity drive :)

 

Thanks a lot for the support.   

 

Going to bed now, i'll update the thread tomorrow.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.