Allow for more than two Parity drives.


Can0n

Recommended Posts

would be nice for those with well into the large number of drives like 20-24 data disks where their potential to lose 2 or 3 at once increases dramatically that unRAID allows for more than two parity disks. personally it should be any amount the user wants.

I saw an 88 drive hotswap case on ebay and thought to myself, if I had that case I surely would want 10 or possibly more parity drives.

Link to comment

Why do you think you have dramatically increased risk of losing more than 2 disks at once?

 

The biggest danger of losing a larger amount of disks is if your power supply fails - but that would endanger not just your data disks but also your parity. So you need to have backup to cover that usage case.

 

I think it would be much more useful if unRAID could support multiple RAID, so you don't need to have 20+ drives online just to verify your parity.

Link to comment

I just feel we should have that option. When you have a raid array is 14 disks and only two parity it more risky than say 8 drives and two parity. 

 

 

And and honestly I have seen it at work on a server with over 200 drives 10/week are failing and being replaced. 

 

But it like you mentioned theses are multiple raid 5 setup so there has been no data loss and not to mention all data is duplicated across the country and stored on 3 master library server in three cities. 

 

That said aid for the average home user it’s prohibitively expensive to run a second server to back everything when the current server is running a bunch of 8TB drives and currently using about 51TB if the 71TB of usually space.  Upping to 3 or 4 parity drives could potentially help in certain cases.

Link to comment

unRAID's parity drive provides a lot of protection. But some failure scenarios actually corrupt parity - like a dying drive spewing junk. Doesn't matter if you have one parity or a 1000, this behavior will corrupt all parities and make parity recovery impossible.

 

And when you are talking about low probabilities like dual drive failure - things like flood, hardware failure (e.g. PSU faliure causing spike), fire, theft, vandalism, and accident need to be considered. These are things that can early knock out all or a good percentage of the drives. Tri-parity+ does not help maybe as much as pictures of your hardware so you could file an insurance claim!

 

Now I get it, a lot of people have media on their servers. And although any one (or even 100) media files might be recreatable, thousands+ are not. And since it is not economic for some of us to keep a large backup set, anything we can do to raise the probably of not loosing our data is worthwhile. In that spirit, with a large array, an extra parity can make sense. But I did a back of the napkins analysis and figured single parity would protect you some 95% of the time. And that dual parity would protect you maybe 0.5% extra. (And that was based on a 100% chance of a disk dying every year). A third parity, you are in the hundreths or thousandths of a percent advantage. Is that worth the price of a drive as large or larger than your largest drive?

 

Two parities can really help in one case - with users that make mistakes. A shot in the foot or a poorly connected cable while trying to recover from one failure is hugely more likely than a second physical failure. That's why I recommend dual parity for new users, at least til they learn the ropes. I also recommend hot-swap cages to all but eliminate cable issues (the highest risk for data loss!)

 

Another important thing to remember is that unlike RAID, losing 2 drives with 1 parity results in losing only 1 or 2 disks worth of data. So even in a large array, the scope of your data loss is limited. In an equivalent RAID5 array, you'd loose the entire array. So while data loss in RAID5 vs data loss in single parity, although equally likely, the impact is hugely more for RAID5.

 

With the tools that unRAID gives you to put the array back together and say abracadabra, parity is valid again, many situations become recoverable even if not 100%. When RFS was popular, we found it to be very good at recovering from data corruption after a partial recovery and salvaging the lion's share of a disk even in less than optimal recovery conditions. Unfortunately RFS is now a very poor choice, and XFS is the most popular alternative. But XFS does not provide good recovery from corruption, so unfortunately that hurts our ability to recover. (@c3, maybe you can fix this in your spare time!) Don't know about BTRFS - if it is good at recovery from corruption, that might be a better long term choice.

 

Would 3rd, 4th, and nth parity be useful to some. Probably not, although a lot of users would think it would be and put those protections in place. But if instead of loading more parities, if people used the extra disks for a backup and slipped into a safety deposity box, they'd have protected themselves 1000x more than the extra parity would protect them. And everyone should be using hot-swap bays to avoid cabling problems. If we focused on the real risks, and dropped the panacea that the real problem is drive failures, we'd all be more protected!

  • Like 1
Link to comment

Having more than one drive fail at the same time isn't really very likely.

 

What happens more typically is that people are already running with a failed drive when another fails, and they never knew they had the first failure because they didn't have Notifications setup to tell them, and that first failed drive was being emulated by parity so they weren't even missing any data.

 

Do you have Notifications setup?

Link to comment

unRAID is not like other RAID systems, even with dual parity, loss of three drives, unRAID does NOT lose everything. All the remaining data drives are still valid independent filesystems. In a traditional RAID with data striping, the data loss is amplified by stripe size.

 

XFS has xfs_repair which is very good for recovering from corruption. There are some dangerous options, which may be the cause of your concern. Typically, the filesytem must be unmounted, but xfs_repair has the "dangerously" -d option to allow a mounted filesystem to be repaired. Also the -L option which zeroes the log, losing transactions.

 

XFS is getting enhanced corruption detection with the addition of checksum for data, currently only metadata has checksum (added v5, kernel 3.15).

 

And +1 to making sure notifications is working.

 

Mostly of my storage has more than three parity blocks, up to 14.

Edited by c3
Link to comment
13 hours ago, Can0nfan said:

I just feel we should have that option. When you have a raid array is 14 disks and only two parity it more risky than say 8 drives and two parity. 

 

 

And and honestly I have seen it at work on a server with over 200 drives 10/week are failing and being replaced. 

 

But it like you mentioned theses are multiple raid 5 setup so there has been no data loss and not to mention all data is duplicated across the country and stored on 3 master library server in three cities. 

 

That said aid for the average home user it’s prohibitively expensive to run a second server to back everything when the current server is running a bunch of 8TB drives and currently using about 51TB if the 71TB of usually space.  Upping to 3 or 4 parity drives could potentially help in certain cases.

There is a big difference between "more risky" and your previous statement "increases dramatically".

 

If you have a server with over 200 drives and 10/week are failing that means 500 drives/year is failing - or in other words the server has an average life expectancy of the drives of about 0.4 years. When having servers with a huge number of drives and having multiple redundancy RAID (where you might have mirrored RAID) it's not uncommon to not replace a broken drive instantly but have the server make use of spares and redundancy and now and then do rather large updates of many drives at the same time.

 

Anyway - one parity is dangerous because there is zero redundancy while a drive is being replaced. Two parity drives greatly improves the situation because you can replace a single drive while you still have redundancy. And if you regularly keep all drives working so you don't have any sleepers that are bad without anyone spotting it, then the probability isn't very large for two more drives to fail while one drive is being replaced.

 

The most dangerous part is systems that has drives that are seldom used - lots of systems around the world where there is no regular SMART tests and the data is seldom accessed - so when one drive is found failing, then there may already be multiple failing drives in the array. And these failures aren't noticed until the rebuild after replacement of the first drive. That's one of the reasons why I was irritated a couple of years ago when all work on version 6 concentrated on virtual machines, Docker etc while the system still didn't had scheduled SMART scanning and no mail functionality to report problems.

 

In the end, it's more important to consider file systems with checksumming of the individual data, to catch silent errors instead of having these errors propagating into the parity, resulting in a disk replacement actually writing incorrect data to the new drive.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.