First NAS Build Advice


jfredson

Recommended Posts

Yes its up to the OP if he wants dual parity etc. But he is asking us for our advice then can make the best decision for him. We would all like dual parity and lots of backups but cost is always a factor.  For the OP i would recommend to start with buying three 8tb drives as he only has 5tb of data which he predicts will grow by 2 to 3tb per year.

I personally would  then use the drives like this

drive 1  --- Parity

drive 2  ---Data

drive 3  -- Unassigned drive for backup

 

Then use a container such as duplicati to make both an onsite backup to drive 3 and another to the cloud.

This will last him until he has more than 8tb of data at which point he can add more drives and rethink the best way forward.

 

ps @tdallen lol hey I love the led lights in my rig. They're my favorite bit of the whole computer :)

 

Edited by gridrunner
Link to comment
40 minutes ago, tdallen said:

I think the delta is that SSD is highlighting is the need for all other disks in the array to be healthy to recover from a failure, not just the two parity drives.

 

That said I think we need to differentiate value and probability.  Yes, the probability of taking advantage of dual parity in a dual data drive failure scenario is low.  To SSD that implies the feature has marginal value.  But value is in the eye of the beholder.  I derive value from offsetting complexity in my environment.  Some people might derive value from increased speed of recovery from a dual failure.  Others might simply be superstitious.  Value pops up in many strange ways.

 

It's important that people understand that dual parity (or single parity for that matter) isn't a substitute for a backup strategy.  To be fair, they should also understand that the probability of taking advantage of dual parity to address dual data drive failure is pretty low.  But hey, some people derive value from LED lit case fans and cases with windows.  Dual parity would help me sleep better than some of the other things that people spend money on... 

 

To the OP - my opinion that that implementing dual parity with 4 or less data drives would place you in the highly conservative category.  6-10 drives, conservative.  12-14 or more is the more common use case.  Personally I am conservative and would implement dual parity by the time I hit 10 drives.  But my focus would absolutely be on a solid offsite backup strategy before dual parity.

 

I respect your opinions, and up for a respectful debate on this.

 

I tend to define value a little different. Value is tangible quantifiable amount. Although I may value my relationship with my wife more than my mother in law, that's not the type of value I am talking about. While there might be differing opinions on the inputs to making a value determination, once the goals and inputs are defined, it should be pretty objective analysis.

 

If a person has backups and would only need to use them in 2.5% of failure cases (and a typical user probably had 1 or 2 such cases a decade), while spending another $350 would lower the chance to 2.1%, someone would have to attach a value to that 0.4% (I call it marginal.). Remember the first parity gave you 97.5%! I don't call that marginal!  Prove to me you couldn't get a better return with a fire extinguisher, deadbolt, our home security system. Rip my symptoms and goals to shreds and you might make a good counter argument. I'd frankly love to see it.

 

If you have an array full of 10 year old drives and haven't run a parity check since 2009? Darn right I'd want dual parity. But I laid our the best practices to follow so a user would not get to that point. 

 

I'd like to better understand the reasons that lead you to your assessments around drive count and dual parity value. If it's all about feeling good about being protected, that's not very helpful to me. If its about people being lazy and not following best practices, that's valid. If it's got some analytical meat behind it, I'm all ears. But saying dual parity is twice the protection is wrong, as much as saying 10 parities is 10x the protection. Or two spare tires in your car is twice the protection from being stranded on the side of the road.

Link to comment
1 hour ago, gridrunner said:

Yes its up to the OP if he wants dual parity etc. But he is asking us for our advice then can make the best decision for him. We would all like dual parity and lots of backups but cost is always a factor.  For the OP i would recommend to start with buying three 8tb drives as he only has 5tb of data which he predicts will grow by 2 to 3tb per year.

I personally would  then use the drives like this

drive 1  --- Parity

drive 2  ---Data

drive 3  -- Unassigned drive for backup

 

Then use a container such as duplicati to make both an onsite backup to drive 3 and another to the cloud.

This will last him until he has more than 8tb of data at which point he can add more drives and rethink the best way forward.

 

ps @tdallen lol hey I love the led lights in my rig. They're my favorite bit of the whole computer :)

 

I like this for the OP! A local backup would be very fast to restore if ever needed.

Link to comment
3 minutes ago, SSD said:

I like this for the OP! A local backup would be very fast to restore if ever needed.

Heh, I already recommended that "Your current data storage needs are actually pretty small.  While I certainly would encourage you to have the usual 3-2-1 backup strategy with one of the copies in the cloud, have you considered starting with simple offsite cold storage?"

 

49 minutes ago, SSD said:

I'd like to better understand the reasons that lead you to your assessments around drive count and dual parity value.

I think there's a vague correlation to how people who post on the forums have actually implemented dual parity, but I'm better off fess'ing up and saying this is purely personal opinion.  People not following best practices, buying drives in batches, etc. enter into it, but there's no analytical meat behind it.

 

That said, I'd say that your risk based analytical framework is missing a component.  The short version is basically "I know the probability of failure is low but I *really* don't want to deal with the consequences of failure".  The long version (skip over if you're not interested) is:

 

I'd summarize what you've presented as:

(cost of risk mitigation) * (probability of failure) = (value of risk mitigation)

 

In my view you need to have:

(cost of mitigation) * (probability of failure) * (impact of risk realization) = (value of risk mitigation)

 

In other words, what if the risk implies something bad happening?  Completely overblown example - carrying two spare times doesn't make any sense if there's a service station on every corner, but what if I'm participating in an autocross in the desert?  It's potentially worth carrying two spare tires in that situation because the impact of risk realization is severe. 

 

In a more rational example, let's say I am running a small business (like the OP) and experience dual data disk failure.  If I am running single parity I need to go to my backup strategy.  The files on both failed data drives would unavailable until I can access the backup which depends on where the backups are located and time/effort to restore.  Now, if I am running dual parity unRAID will begin emulating both drives immediately.  There is no business interruption.

 

So what is the impact of risk realization?  If I say - no biggie, I'll just grab the USB backup tonight when I go home then impact is low.  If I say that it could cause me to miss a client deadline that impacts revenue, then impact is significant.  In either case the impact of the risk actually occurring needs to be factored into the value proposition.

Link to comment

Well said Tdallen, and I would say that your last example can apply to home users too. If I suffer two drive failures and have dual parity, I don't have to go to my backup. Am I being lazy? Sure, the cost vs convenience to me, since I value my time, is worth it. It may not be the same for everyone. Perhaps not everyone can afford the second drive for parity or has more time to spend restoring from backups, that is obviously ok and their choice. For me, as I said, it's a value proposition based on time, I value my time and would rather not have to spend hours restoring lost data from a backup if I can configure a second parity drive to avoid data loss.

Link to comment

@tdallen

 

Generally agree with you. But just think there are some extremely low risk scenarios that blend into the background. And dual disk failure on even a 10 drive array, that had been well cared for, is in that category IMO.

 

But I do think that lack of array care and feeding, like the 10 year old array that's never run a parity check, or the person with 27 drives and  a rats nest of cables that get knocked lose on every disk swap, and the newbie who is apt to shoot himself in the foot, all of those scenarios benefit from dual parity because it is so easy for them to encounter the second coincident failure.

 

My comment about the two spare tires was just meant to say 2 spares don't provide twice the protection. Whether it is economic or worthwhile is another matter.

 

@ashman -

 

Wonder what drive count you'd add third, fourth, and fifth parity drives? Not sure you really read the post I referenced, the the risk of a failing disk corrupting all parities is higher than the risk of a second drive failure. You might say you're buying flood insurance when the risk is really tornado. We really need something like par sets to protect yourself in another dimension. So if parity were corrupted, your rebuild would be corrupted, and your par set could correct all the corruptions. Dual parity absolutely can't do that. I'd do that one in a second. In fact I did it. Problem is par sets take a long time to build on content. Adding even one file is impossible. The other problem is the par standard itself isn't prepared to deal with a full multi terabyte disk. The blocks quickly get huge. Maybe newer par standard is better. If so,  you could take a full disk, build a few hundred or thousand par blocks, and be able to put a wonky disk that was rebuilt after parity was corrupted right again without checksums! 

Link to comment



[mention=62842]tdallen[/mention]the the risk of a failing disk corrupting all parities is higher than the risk of a second drive failure.


Have to disagree with this one, you only run the risk of corrupting parity if you run a correcting check, that's why all scheduled checks should be no correct.

Regarding dual parity, and as someone who twice had a second disk fail during the rebuild of another, I feel the cost of an extra disk it's well worth the extra protection, specialy for larger arrays of 10 or more disks, though smaller arrays with very large disks could also benefit.

Link to comment
7 hours ago, johnnie.black said:

I feel the cost of an extra disk it's well worth the extra protection, specialy for larger arrays of 10 or more disks, though smaller arrays with very large disks could also benefit.

 

+1! Speaking as someone who recently had a drive controller fail, dual parity allowed me to quickly and painlessly recover 2 out of the 3 drives that got corrupted.

 

I've since instituted a nightly sync with CrashPlan to avoid ever losing anything important again, but Crashplan still won't make the recovery as quick as a disk rebuild from parity, nor will it backup the terabytes of less-important re-rippable media content that I wouldn't back up to the cloud anyways.

 

The extra upfront cost of a second parity drive is more than worth it to me. Especially as it not only protects me from loss if multiple drives fail at the same time, but, as @johnnie.black mentions, if a second drive fails before I can get the first one rebuilt.

Link to comment

I laid out my facts and factors. If it is true that you are having that level of issue with your array, definitely go for dual parity.

 

I would mention that in @johnnie.black's scenario, the second drive didn't completely fail, but instead had read errors. If that happened it would mean some fouled sectors that could be found with checksums, and the few files impacted could be restored. So such a scenario might still represent a very recoverable situation if you are generating checksums.

 

I just don't see that many people posting about dual failures. I haven't had a sing;e failure in over 5 years. And a double failure - never. Am I living on the edge - I don't think so. Drive failure rates are in the 2-5% range per year. And then if you are looking at a single recovery handling 95% of those, you are down to a very small number.

 

But I have hot-swap for all of my disks, shuffle old drives after 4-5 years. Run monthly parity checks. And keep tabs on my SMART attributes.

 

This whole things started with @ashman70 saying dual parity was twice the protection. It may be able to rebuild twice the data, but the chances dual parity would be needed is dramatically smaller than the chances a single parity could be needed, and single parity is not needed very often.

 

But one thing I we can all agree on - with the facts each user can make an informed decision.

Link to comment
8 hours ago, SSD said:

But I have hot-swap for all of my disks, shuffle old drives after 4-5 years.

 

That is something I don't do, in part because my oldest disks go to the backup servers which spend most of the time offline, but unless I need to upgrade due to low space I use all my disks until they fail, I'm still using disks from before 2010, though they have relatively low power on hours.

 

8 hours ago, SSD said:

if you are generating checksums.

 

I do wish more users were using checksums, either generating them externally or by using btrfs, for me they are invaluable when there's an issue so it's easy to check data integrity.

 

It's also obvious, that for anyone you can afford it, a backup server should be priority one, but dual parity is a very good value, besides the extra protection, it increases your peace of mind when replacing/upgrading a disk, another advantage is dual upgrades (as long as you have backups and/or keep the server data static and the old disks until the upgrade is done), I try to always upgrade 2 disks at the same time on my dual parity servers, saves a lot on rebuild times, especially as disks get larger.

 

 

Link to comment

I never rebuild a disk. Instead, I add the disks similar to "unassigned devices". (Have to partition it manually because UD is not able to do the partition so it is unRAID compatible.) I actually mount in my go file, but UD should work fine for mounting and I think formatting as well. Just not partitioning. And then copy the data across from array disk to non-array disk. Then do a new config and rebuild parity when done. If you know what you're doing, you can maintain recoverability in case of a disk failure (which I've never had happen because I always do a parity check shortly before this exercise).

 

You can copy as many disks in parallel as you like - often I am replacing more disks that I am adding (added disks are larger) - and I can do the copies in parallel, trying to avoid reading or writing multiple steams from/to the same disk. The writes are without parity overhead. Using screen you can create a couple of simple shell scripts to copy the disks/directories you want to the disks you want. Since I am replacing smaller disks with larger ones, this lets me copy the data I want to the disks I want and not have to babysit. And if there is any sort of read error (never had it happen), I'd know exactly what file it was on, and know that file needs to be restored. And a side benefit, it also results in defraging the disk. There is a tool I DLed that will copy and do a checksum comparison in one operation that I use as an extra check that slows things down a little, but gives me extra confidence. I've never had it report a mismatch, but do it anyway.


I like this method for doing incremental disk modernization.

 

I have not done a disk rebuild in forever - probably 6 years. I would not plan to do one unless I literally had a single disk fail and wanted to rebuild it.

Link to comment

I feel like we've hijacked the OP's thread, but I went back and re-read some of the material that influenced my opinion on dual parity...  gotta post it somewhere :).

 

My guess is that the average unRAID user who experiences a failure (and therefore would be helped by dual parity if a second failure occured):

  • Acquired their drives in batches, and has drives at least as old or older than the failed drive in their array.
  • Will run their array in a degraded state for at least 5-7 days, and 10-14 isn't unusual (diagnosis, trial and error trouble-shooting, acquire new drive, rebuild).
  • Does not retire drives early due to age.
  • The larger the array, the more likely that some drives are more than 3 years old.

Drawing from the Backblaze data analysis (big kudos to them for publishing their data freely):

  • The bathtub hypothesis of failure rates doesn't completely pan out because infant mortality is less than expected.  That said, once past 3 years old the annual failure rate for drives skyrockets from an average of a 1.4% annual failure rate to 11.8%. (source)
  • The work by Ross Lazarus with Kaplan-Meier statistics and plots similarly indicates that for many models of drives in the Backblaze study you see notable trends in the survivability analysis over time. (source)

drivefail_resMay2017_model.png

The vertical axis represents the fraction of drives which survived at any given point in time and the horizontal axis represents days since time zero.

 

So what does this mean to you and me?  Broadly the statistics say there's a very small chance of dual drive failure in a short period of time, but if you drill down another layer it gets interesting.  If I'm running an array of 2 year old HGST drives and I experience a failure I'd feel like dual parity is luxury - my drives are very reliable and I've got plenty of time to replace the failed drive.  But if I'm running less reliable drives or a set of older drives I might have a problem because some drives fail with either steady or increasing frequency as they get older.  And then there were those Seagates... 

 

So the user has to make an informed decision.  If they a) might have drives that turn out to be among the less reliable, b) might be slow to react to a drive failure and c) haven't implemented a drive retirement strategy aside from capacity upgrades resulting in an array with older drives, then those are risk factors and dual parity has an increasing chance of providing some value.  It's still a *small* chance but it's greater than broad statistics suggest - and my guess is that's it's pretty common for unRAID users to run with one or more of these risk factors.

  • Like 1
  • Upvote 2
Link to comment

I too have learned from this thread. Thanks to all that have pushed back on my comments - which I hoped to generate counter-analysis from other experts to have a meaningful discussion on this topic.

 

It is no secret that I'm a little disappointed that dual parity is not more useful. I believe that it could - if doing a parity check and finding a mismatch - triangulate on the disk at fault for a parity error. This would be tremendously useful, and there is no other way to do so.

 

@tdallen

I can agree with your analysis. Would be nice to bring some specific percentages from the analysis.

 

I would add these risk factors:

- Poor cabling

- Not using hotswap bays

- Not running monthly parity checks

- Not running FCP, or equivalent with frequent manual review of SMART data

- Not maintaining checksums

- Using drives with more than 5 years power on hours

- Using multiple drives of same model and similar vintage as one (or especially more than one) that have failed or been replaced due to SMART issues

- Using drives with history of failure (e.g, > 10% at current age) according to BackBlaze, or poor forum / Amazon / Newegg feedback

- No backups of critical data (dual parity is not a replacement for a backup - but if you don't have a backup, its hard to argue it isn't better than nothing. But instead of using a drive for dual parity, it is better to use that drive as a backup drive of more critical data and put into a safe place)

- Unfamiliar / uncomfortable with unRAID and recovery techniques (EVERY new user should run dual parity for some period of time IMO)

- New array during burn in period

- Anticipation of flooding or other natural disaster. The two use cases I am sure dual parity helped or would have helped were literally flooded arrays.

 

All of the above are indicators for dual parity. And there may be more. The nice thing about dual parity is that it is easy to drop it if the extra capacity is needed and you have overcome issues above. 

 

I would add one more point - RFS had a much better tool for single drive recovery than XFS. You might say it was as good as dual parity in many situations. XFS is, for all practical purposes, impossible to recover meaningful binary data. In the RFS era, we frequently saw users shoot themselves in the foot with parity and still able to recover with some RFS heroics. That isn't happening with XFS, despite is being a much more reliable and high performance file system. Dual parity does not protect you from shots to the foot, but might just protect you from one. It is a good idea to have while in your learning curve. I'd be interested in @johnnie.black's thoughts on BTRFS. Does it have superior drive recovery tools to XFS? If it does, that might be a good reasons to consider migrating to BTRFS IMO.

 

Bottom line - if in doubt - run dual parity. But realize that small to medium sized arrays, that contain relatively new and well-regarded drives with no SMART issues and with an experienced user at the helm, might not benefit commensurate with he cost. If all your risk is related to just drive failure rates and not the confounding risk factors above - dual parity can be an unnecessary and expensive luxury IMO.

Link to comment
5 hours ago, SSD said:

I'd be interested in @johnnie.black's thoughts on BTRFS. Does it have superior drive recovery tools to XFS? If it does, that might be a good reasons to consider migrating to BTRFS IMO.

 

Difficult to say for sure, although btrfs fsck is still far from reliable, btrfs restore usually works very well on damaged filesystems, IIRC it always worked for anyone on the forum who needed to used it, at least in the last couple of years, while I remember at least a couple of instance of xfs_repair no being able to repair the filesystem.

 

IMO btrfs on unRAID has a bad reputation because it's mostly used for cache pools, and it' much more prone to issues on those cases, mostly from hardware issues, like when a pool device drops offline btrfs can have issues recovering, especially if the device is reconnect later without the user taking the appropriate steps, or allocation related issues exacerbated by how cache is typically used, i.e. constantly being filled and emptied.

 

I find btrfs on array disks very stable, I never had any issues and I'm using it on more than 100 disks, for me data integrity is very important and that leaves btrfs as the only option, because creating checksums externally is not practical for the way I use most of my servers.

Link to comment
8 hours ago, SSD said:

I never rebuild a disk. Instead, I add the disks similar to "unassigned devices".

 

I can't see the advantages of doing this, it will take much more time, first copy disk, then re-sync parity, instead of just one rebuild, unless am I missing something?

 

8 hours ago, SSD said:

(Have to partition it manually because UD is not able to do the partition so it is unRAID compatible.)

 

UD can now be used to partition disks, in fact it should be used or you'll probably have issues once you update to v6.4 with them not being recognized as containing valid partitions.

Link to comment
4 hours ago, johnnie.black said:

 

I can't see the advantages of doing this, it will take much more time, first copy disk, then re-sync parity, instead of just one rebuild, unless am I missing something?

 

As I had said, replacing one disk with one disk, the rebulid is better (it is just not something I do very often if at all).

 

But say you're replacing 4 disks with 3.  A parity build is going to happen anyway. And the copies can happen 3 at a time, all at full speed.

 

4 hours ago, johnnie.black said:

 

 

UD can now be used to partition disks, in fact it should be used or you'll probably have issues once you update to v6.4 with them not being recognized as containing valid partitions.

 

You had sent a partition command that I used the last time I did updates that worked fine. The other method is to boot unRAID with a fresh config directory, and create a dummy array, and format the disks. A third option is to unassign cache and reassign. I am not sure what effect cache pools would have on this, but with a singe cache this works ok - although you can wind up with some cache only shares getting created, on the new disk, along with a docker.img and a few other files. Once you stop the array and put the real cache back in place, and mount the newly formatted disks, those file can be deleted from the unassigned device and you can get on with whatever you were planning.

Link to comment
2 minutes ago, SSD said:

But say you're replacing 4 disks with 3.  A parity build is going to happen anyway. And the copies can happen 3 at a time, all at full speed.

 

So you're upgrading some disks but reducing the total number of array disks, in that case that makes sense.

 

3 minutes ago, SSD said:

You had sent a partition command that I used the last time I did updates that worked fine.

 

It works for v6.3 and below, it won't for v6.4.

Link to comment
1 hour ago, johnnie.black said:

So you're upgrading some disks but reducing the total number of array disks, in that case that makes sense.

 

Yea - rare I am doing just one at a time. Only if a disk is a problem. I tend to go in cycles, where I upgrade disks and reduce disk count. Then I add larger disks one at a time until I get to the magic number again, which keeps me going for a few years. By then drives have grown in size, often 2x+. And I can do another upgrade cycle and reduce drive count again. Old disks become backup disks. I'm normally upsizing parity at the same time. Then I am good for a few more years. I'm hoping by the time I need to go again. disks are at least 12T, hopefully 16T.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.