EMPTY and Remove a Drive Without Losing Parity


NAS

Recommended Posts

I'm sorry if this has already been said....  I have a basic understanding of how parity works, but I don't understand all the details discussed.  Anyway....  if one drive is removed.... the parity for that section would simple be reversed from it's current state?  so rather than rewriting that entire section why not just have a 'flag' that indicates that the way that section of parity is treated should be in reverse?  sorry if that makes no sense... It's kinda hard to explain what I'm thinking!

 

NO !!  The impact that removing a drive has on parity can only be determined by reading the entire drive and the entire parity drive and adjusting the parity drive to reflect the impact of that drive not being in the array.    The ONLY way it won't have an impact is if the drive contains all zeroes -- since it would then not have caused any changes in the parity bits.    That's why it would have to be completely zeroed while part of the array (thus resulting in the appropriate parity updates) before it could simply be removed.

 

But as we've discussed at length, this whole process isn't needed => it's very simple to just be sure the array is "clean";  then do a New Config and run at risk while parity is computed for the newly configured array.

 

Link to comment

Why are we discussing this as a "new" feature?

- We were be able to do it in v4.7;

- Somehow we're not able to do it in v5.0.

Just tell us the steps how we can do it again.

 

(After that, I don't much care if it turns into a WebGUI feature or not.)

 

No, you could NOT "... do it in v4.7"

 

The instructions in v4.7 were to (a) copy any information off the drive back to your array so you don't lose any data; and then (b) reset parity -- which you did by choosing to "reset array configuration."

 

That's exactly the same thing as "New Config" ... the wording is just a bit different.

 

In other words, the instructions now are:

 

(1)  Copy any information off the drive back to your array so you don't lose any data;

 

(2)  Do a "New Config" from the Utils tab and Start the array.

 

Link to comment

Why are we discussing this as a "new" feature?

- We were be able to do it in v4.7;

- Somehow we're not able to do it in v5.0.

Just tell us the steps how we can do it again.

 

(After that, I don't much care if it turns into a WebGUI feature or not.)

 

No, you could NOT "... do it in v4.7"

I have done it in 4.7, following this guide. As that post outlines, it only works up to the 4.7 release as stated there, other newer releases require different directions, or may not be possible at all.

 

It definitely worked in 4.7, I've done it myself.

 

Theoretically the same sequence may be able to be followed in the 5.0x current releases, by utilizing the new config followed by the parity already valid procedure instead of the set invalidslot command, but it would need to be tested to be sure.

 

It's complicated, and prone to errors, as filling the wrong drive with zeros and irretrievably erasing data is easy to do if you don't know linux well. This whole discussion is about making a currently difficult and risky procedure easier and less risky.

Link to comment

The first statement in the thread that describes how to do this in v4.7 says "... unRAID does not (currently) provide a feature to remove a drive from the array without losing parity integrity. "

 

And it never did.    It was never a "feature" of v4.7 => i.e. there was no "remove drive" button or capability.    The workaround outlined in the referenced thread does EXACTLY what has already been discussed in this thread ... writing zeroes to a drive;  then simply forcing parity to be recognized.    In fact, v5.0 makes this MUCH simpler with the "Trust Parity" option, which does not require "fooling" it into doing a parity check instead of a parity sync ... v5 would simply consider parity valid, period.

 

v4.7 did NOT have a built-in "Remove Drive" function ... nor does v5

 

As for writing zeroes to a drive to do this ... I tend to agree with Tom:

 

To have a utility to write zeros to a drive you intend to remove in order maintain parity seems very risky to me and I wouldn't want to do that.

 

Link to comment

Theoretically the same sequence may be able to be followed in the 5.0x current releases, by utilizing the new config followed by the parity already valid procedure instead of the set invalidslot command, but it would need to be tested to be sure.

 

I did tested it before and assure you it works, at the time I did tested it I did posted steps at:

http://lime-technology.com/forum/index.php?topic=6728.msg213135#msg213135

(not the best place to post I recon now :) )

Just need to care and double-check and triple-check you zero the right disk number ;)  Anyway the zeroing procedure can be very easily implemented on a script or even a plugin that will do prior extra checks to ensure file system on that disk is empty before doing the task, to be safe.

 

The more complex (and prone to error) part of the process is IMO re-assigning all disks on the right slots after New Config, but even that can be done relatively safe with some care... and anyway Tom seems willing to sort that, then maybe zeroing can just be left to a very simple plugin, if Tom doesn't want to implement it as native.

 

To who doesn't understand how parity works, and why zeroing is needed... just think on parity disk as a disk that for every single byte on it it contains a "sum" of the bytes (for the same position) of all the data disks, then to remove a disk from the array, and still keep parity valid, it's needed to "subtract" from the parity (for every single byte on it...) the data of the disk to be removed, one way to do it is writing zeros to all bytes of the disk to be removed, while it is still part of the array, as doing that will update parity to match that, and that will be actually doing the required data "subtraction" from the parity, as a disk with all zeros will be same for the "sum" (parity) as if that disk doesn't exist at all. After doing it, the disk can be removed (i.e. not included after New Config) and we can still tell unRAID that we know "parity is already valid" - it will force unRAID to believe in that, without any check - as parity will be really valid and a parity check after it should end with 0 errors.

Link to comment

Let me take that post a bit further and explain how simple in concept this is.

 

You remove files from disk. (anyone can do this)

Write zeros to disk while it is still protected (this is easy)

Tell unRAID to not protect this disk. This requires no further parity work. (this no longer works in v5)

Done

 

Everyone has an opinion on what is more important to be added so please lets not start that whole debate again. What we are discussing here is a simple command line tool to remove a drive.

 

 

However with Tom help it could be even easier.

 

Mark disk as RO

Remove it from parity

done

 

 

Link to comment

Let me take that post a bit further and explain how simple in concept this is.

 

You remove files from disk. (anyone can do this)

Write zeros to disk while it is still protected (this is easy)

Tell unRAID to not protect this disk. This requires no further parity work. (this no longer works in v5)

Done

 

That's been said MANY times in this thread already !! (i.e. zeroing the drive)    But if the write zeroes still works fine (which it apparently does) ... then WHY do you say the following ...

 

Tell unRAID to not protect this disk. This requires no further parity work. (this no longer works in v5)

 

It will work perfectly in UnRAID v5 ... you simply do a New Config without including the drive you just zeroed; and check the "Trust Parity" box.    In fact, that's much easier than the kludge you had to do with v4.7 that forced a parity check instead of a parity sync.

 

Done :-)

 

 

Link to comment

eh ...

 

have you seen the docs for trusting parity

 

http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

its a sea of warnings that links to discussions in the forum saying that the procedure is out dated and doest work... that then leads to a conclusion that it may work but it will kick of a parity check.

 

I am not saying you are wrong, in fact you are probably right, there is just is too much conflicting info and anecdotal POCs and no actual validated and documented procedure.

 

 

 

 

Link to comment

eh ...

 

have you seen the docs for trusting parity

 

http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

its a sea of warnings that links to discussions in the forum saying that the procedure is out dated and doest work... that then leads to a conclusion that it may work but it will kick of a parity check.

 

I am not saying you are wrong, in fact you are probably right, there is just is too much conflicting info and anecdotal POCs and no actual validated and documented procedure.

The reason for all the warning is that one mistake and your data on a disk you did not intend to remove is gone AND there are many versions of unRAID where the command effect was undone accidentally by the next command attempted.

 

All it takes is a single "refresh" of the browser at the wrong time and the effect of the "set invalid" command is un-done.

 

I agree with Tom.  It is FAR too dangerous an operation to simply allow a zeroing of a drive.  Within the past few days Tom did describe how to use the "set invalid" command to assist a user with a multi-disk issue on 5.0 final.  Use it for your reference (and make sure you are on a version where it will work).

 

Joe L.

 

 

Link to comment

So that is it then we are left with either community procedures accepting and trying to minimise the risks or nothing?

 

I now no longer need to do this so my interest in it is lessening by the minute especially since it comes down to another one of these situations.

 

Frustrating as always.

Link to comment

As Joe said, it's not that "Trust Parity" has any issues -- it's just that ANY mistake in doing it will leave your system "falsely protected" ... i.e. you'll think you have good parity, but you won't.

 

I'm sure that if there was a "supported" way to write zeroes to a drive (which you seem just fine with doing) it would also have "... a sea of warnings ..." about the terrible things that can happen if you do it wrong (i.e. write zeroes to the wrong drive).      In fact, there are MANY examples in this forum of folks asking how to recover when they're made even simpler errors -- assigning the wrong drive as parity;  formatting a drive they shouldn't have;  and even pre-clearing the wrong drive.  Warnings are there for a reason -- the message is you need to be CERTAIN what you're doing is indeed what you intended.    They still don't work in every case -- folks either don't read them; or simply don't think they could have made a mistake and press on without double-checking.  A warning doesn't mean a process doesn't work -- just that it can have disastrous consequences if used incorrectly.

 

Link to comment

eh ...

 

have you seen the docs for trusting parity

 

http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

its a sea of warnings that links to discussions in the forum saying that the procedure is out dated and doest work... that then leads to a conclusion that it may work but it will kick of a parity check.

 

I am not saying you are wrong, in fact you are probably right, there is just is too much conflicting info and anecdotal POCs and no actual validated and documented procedure.

 

That wiki page is not up to date.

 

If you really want to remove a drive and leave parity undisturbed, this will work:

1. Start array in Maintenance mode.  This ensures no file systems are mounted.

2. Identify which disk you're removing, let's say it's disk3.  Take a screen shot.

3. From the command line type:

dd bs=1M if=/dev/zero of=/dev/md3   <-- 'md3' here corresponds to 'disk3'

4. Go to bed because this will take a long time.

5. When the command completes, Stop array, go to Utils page, click 'New Config' and execute that Utility.

6. Go back to Main, assign Parity, and all devices except the one you just cleared.

7. Click checkbox "Parity is already valid.", and click Start

 

Any code changes I make will just refine the above process.  Here are several I can think of:

a) Add ability to mark a disk "offline".  This let's us Start array normally in step 1 so server is not down during the process.

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

c) Add an explicit "Clear the offline disk and remove from array when done" button.

 

Right, so all above could be done, but if I were to create a poll asking the Community what are the top 5 features that need to go into unRaid, would these refinements rise to that level?  Would they even rise to the top 10?

 

Important warning!!!  The procedure above erases the drive completely, and assumes you have nothing worth saving on the drive.  If you have unsaved files on the drive, COPY THEM OFF FIRST!!!  Or you will lose them!

 

 

Link to comment

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

 

Tom, now you touched a point I had on my head for some time now... and I was just thinking for some time now to write a new topic on roadmap forum just to suggest such a mode!!  My main interest in such mode is not just for this case (to improve zeroing speed) but really for normal array usage, in situations that high performance speed would be useful, for eg. when needing to write huge amounts of data to a disk on the array, we could just switch to that mode, surely with the downside of needing to have all disks spinning, but yet it can be really great feature IMO for when we need it. I guess this can improve writes to array to near full hdd's speed (i.e. similar to parity-sync speed), right?

 

Maybe this should actually be moved/copied to a different topic as a separate enhancement request?

Link to comment

a) Add ability to mark a disk "offline".  This let's us Start array normally in step 1 so server is not down during the process.

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

c) Add an explicit "Clear the offline disk and remove from array when done" button.

a) -- A very good idea!!  (the disk(s) is/are still part of the array, but we just don't mount them.)

b) -- Not so sure: That will break the protection, and that will be bad, especially if the reason I'm removing the disk is that it's flaky and about to die any moment now.  Besides, with this we won't be able to remove more than one disk at the same time.

c) -- Or/and, there can also be a checkbox like "Trust me that disk is zeroed, and just remove it".  (Like we have the "Trust me the parity is valid").  Emhttp can still make a quick check that I'm not lying to it, by reading only the first few megabytes of the disk and seeing that they are indeed zeros -- that takes less than a second.

 

 

Link to comment

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

 

Tom, now you touched a point I had on my head for some time now... and I was just thinking for some time now to write a new topic on roadmap forum just to suggest such a mode!!  My main interest in such mode is not just for this case (to improve zeroing speed) but really for normal array usage, in situations that high performance speed would be useful, for eg. when needing to write huge amounts of data to a disk on the array, we could just switch to that mode, surely with the downside of needing to have all disks spinning, but yet it can be really great feature IMO for when we need it. I guess this can improve writes to array to near full hdd's speed (i.e. similar to parity-sync speed), right?

 

Maybe this should actually be moved/copied to a different topic as a separate enhancement request?

 

The code in the driver is already there and there's a tunable called "md_write_method" to enable it, but I took out ability to configure it long ago, though it would be very easy to put back in.  At the time this was implemented we were still mostly using PCI controllers and dual-channel IDE drives, and I found that as the array width increased, the speed benefit decreased sharply.  I think with as little as 4 or 5 drives in the array the speed benefit disappeared and then started slowing down writes as array width increased.  But maybe the situation has changed now, so I'll hook that control back up and run some tests...

Link to comment

If you really want to remove a drive and leave parity undisturbed, this will work:

Joking aside I have always said disk actions such as remove and expand-replace should be staples of unRAID. This is especially true since one of its main unique selling points is its scalability and disk mix and match.

 

I also think as far as humanly possible once you create parity you shouldn't have to do without it again especially for basics such as this. In fact this should be a design tenant really.

 

1. Start array in Maintenance mode.  This ensures no file systems are mounted.

2. Identify which disk you're removing, let's say it's disk3.  Take a screen shot.

3. From the command line type:

dd bs=1M if=/dev/zero of=/dev/md3   <-- 'md3' here corresponds to 'disk3'

4. Go to bed because this will take a long time.

5. When the command completes, Stop array, go to Utils page, click 'New Config' and execute that Utility.

6. Go back to Main, assign Parity, and all devices except the one you just cleared.

7. Click checkbox "Parity is already valid.", and click Start

 

This is a very clear procedure and if we could somehow remove step 6 it would be much slicker.

 

Also step 3 is the only requirement to go to the command line. I know this is a dangrous step but we should at least consider making it more native so we can make it safer.

 

I especially like the idea of not having to leave unRAID to blank a disk 9as most people do currently) and also this step is almost poor man's (but native) pre clear as well. Food for thought.

 

Any code changes I make will just refine the above process.  Here are several I can think of:

a) Add ability to mark a disk "offline".  This let's us Start array normally in step 1 so server is not down during the process.

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

c) Add an explicit "Clear the offline disk and remove from array when done" button.

 

Sounds like you are pondering stuff. Nothing to add here.

 

Right, so all above could be done, but if I were to create a poll asking the Community what are the top 5 features that need to go into unRaid, would these refinements rise to that level?  Would they even rise to the top 10?

 

Polls in general and especially unRAID polls are pointless. We have a backlog of things people want and each user will argue (and often do) why their particular feature is more important than another.

 

In some respects this holds us back because if we are honest we spend an awful lot of time debating what can and should be done. carpe diem

 

Link to comment

I especially like the idea of not having to leave unRAID to blank a disk 9as most people do currently) and also this step is almost poor man's (but native) pre clear as well. Food for thought.

Actually, writing all zeros to /dev/mdX in the way described will not result in the pre-clear signature unRAID would recognize so it would not zero that drive when added again as an additional array drive. (a pre-cleared disk has a special signature)  It is not a poor-mans native pre-clear.

 

Writing all zeros to /dev/mdX would allow you to remove the drive without affecting parity.

Link to comment

I especially like the idea of not having to leave unRAID to blank a disk 9as most people do currently) and also this step is almost poor man's (but native) pre clear as well. Food for thought.

Actually, writing all zeros to /dev/mdX in the way described will not result in the pre-clear signature unRAID would recognize so it would not zero that drive when added again as an additional array drive. (a pre-cleared disk has a special signature)  It is not a poor-mans native pre-clear.

 

Writing all zeros to /dev/mdX would allow you to remove the drive without affecting parity.

 

Obviously you are correct but equally that is why I said "almost". Details aside it is not a huge conceptual leap to get there and that was, and still is, the point I am making.

Link to comment

I especially like the idea of not having to leave unRAID to blank a disk 9as most people do currently) and also this step is almost poor man's (but native) pre clear as well. Food for thought.

Actually, writing all zeros to /dev/mdX in the way described will not result in the pre-clear signature unRAID would recognize so it would not zero that drive when added again as an additional array drive. (a pre-cleared disk has a special signature)  It is not a poor-mans native pre-clear.

 

Writing all zeros to /dev/mdX would allow you to remove the drive without affecting parity.

 

Obviously you are correct but equally that is why I said "almost". Details aside it is not a huge conceptual leap to get there and that was, and still is, the point I am making.

You are correct.  That would be a very simple enhancement to the process if Tom were to automate it.
Link to comment

I don't think we need to solve this whole process in a day, in fact we should probably actively try to avoid it.

 

There are a few steps in the discussed process that have merit on their own e.g. blanking a disk etc. Each of them add value and are a nice stepping stone.

Link to comment

I also think as far as humanly possible once you create parity you shouldn't have to do without it again especially for basics such as this. In fact this should be a design tenant really.

 

The case where someone wants to remove a drive and preserve its contents makes this difficult.

If you want to keep the drive intact, you can remove the drive and zero the emulated device. I guess this invalidates the premise of keeping protection throughout the operation, but it keeps the original content on the drive. If you are talking about automating the process of moving the data off the drive to other array members before zeroing it, why not put the target drive on the exclude list for all shares, then copy the data to the rest of the array following all other allocation directives? Basically you would turn the drive to be removed into a second temporary cache drive, and run a modified mover script to empty it.
Link to comment

I also think as far as humanly possible once you create parity you shouldn't have to do without it again especially for basics such as this. In fact this should be a design tenant really.

 

The case where someone wants to remove a drive and preserve its contents makes this difficult.

 

Indeed -- while a "remove from array" button could, of course, do a pass to update parity so it didn't reflect that drive, any power-failure during that process would be catastrophic (unless you did this very slowly ... journaling your position at all times).    Writing zeroes to the drive eliminates this risk, as parity is always up-to-date, and when the drive is removed it's "known" that it has no effect on parity.

 

The reality is that this process is trivially accomplished with a New Config and simply running at-risk for the parity sync time ... and as long as you do a parity check before starting, that risk is very small.    And (as I noted earlier) you can completely eliminate the risk if you use a new parity disk (saving the old one until the new sync is done).    I think there are PLENTY of other things you should spend your time on besides this  :)

 

You had it right when you asked the following ...

... would these refinements ... even rise to the top 10?

 

I doubt they'd even be in the top 25 for most users !!   

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.