[Feature Request] Add bcache module to kernel


Recommended Posts

Ever since my early days using virtualization (KVM and Xen), I have always used LVM to provide virtual disks and it was with great joy (and ultimately caused me to purchase unRAID) to see the LVM module included into the unRAID Linux kernel.

 

After various tests and wanting to increase performance, I started experimenting with SSD caching and so tested LVM caching as well as bcache by recompiling the kernel.

 

LVM caching works well and although I have never benchmarked my different configurations, here are the reasons I went against it.

[*]It was difficult to find and install the required tools (thin provisioning tools).

[*]The 'better' "smq" cache policy is either not available in the current kernel or has been dropped (not entirely sure which) so as such, only the older "mq" policy is available which does cause a few warnings when creating cached logical volumes.

[*]Possibly due to "smq" being unavailable, I felt there was no visible performance gained.

[*]Snapshotting a cached logical volume cannot be done unless the cache is removed resulting in a fresh empty cache each time you back up the logical volume onto your unRAID array (which I like to do a lot!).

 

Onto bcache...

 

There are quite a few ways to implement this cache to work with LVM:

[*]Create a LVM volume group with the SSD and HDD(s) - create a logical volume on the SSD as well as a volume on the HDD then make-bcache to create a caching and a backing device for the virtual disk then attach the cache.

[*]Create a LVM volume group using only the HDD(s) and use the SSD as an exclusive caching device - create a logical volume on the HDD and use make-cache to turn it into a backing device, then attach it to the cache.

[*]Use the SSD as a bcache cache device and the HDD(s) as a backing device - then use /dev/bcache[#num] devices to create a LVM volume group.

 

Within that list, 1 and 2 can cause confusion when snapshotting in LVM as well as /dev/bcache[#num] not being consistent after reboots (use UID).

 

My personal preference is 3 - with LVM on top of bcache, my virtual machines (2 gaming vm's with dedicated GPU passthrough - thank you LimeTech for making that easy and Linus Tech Tips for the vid that got me here in the first place) feel a helluva lot quicker the more we use them (further performance can be gained by adjusting the cache to writeback but writethrough is default to prevent data loss should the cache device fail).

 

Snapshotting logical volumes to create backup images on the unRAID array is a lot easier as the virtual disk partitions do not need a bcache header (useful in case of emergency).

 

My own virtual machine desktop is primarily Ubuntu but I do have a Windows 10 config I boot into when I play some Windows exclusive games on Steam. My daughter uses Windows 10 on her virtual machine and we both enjoy the performance boost bcache provides in my setup.

 

I encourage others to test this out as I would really like to see bcache included into the unRAID kernel.

Link to comment
  • 1 year later...
  • 1 year later...
  • 3 years later...
16 hours ago, mrpops2ko said:

im guessing this was never looked at or implemented?

I don't know if bcache makes a lot of sense nowadays because NAND flash is getting cheaper and cheaper and most people use the vdisks on a SSD or NVME anyways.

 

Also a thing to consider is that you have to create the bcache device every time that Unraid is started, of course this can be automated but what is your exact use case for it?

From what I know bcache acts by default only used as a read cache and not a write cache and you have to enable it to even act as a write cache.

Link to comment
8 hours ago, ich777 said:

I don't know if bcache makes a lot of sense nowadays because NAND flash is getting cheaper and cheaper and most people use the vdisks on a SSD or NVME anyways.

 

Also a thing to consider is that you have to create the bcache device every time that Unraid is started, of course this can be automated but what is your exact use case for it?

From what I know bcache acts by default only used as a read cache and not a write cache and you have to enable it to even act as a write cache.

to create a read cache

 

the idea being that anything read by the array goes into tiered storage

so it first gets read and gets put into ram
once the ram is exhausted (and upon coming close to leaving ram entirely) it then gets moved to an ssd

and eventually it will exit the ssd too

 

the point being its on the block level and you can reduce array utilisation for subsequent reads within the time span of it being outside of ram but inside the SSD containment amount

Link to comment
43 minutes ago, mrpops2ko said:

the point being its on the block level and you can reduce array utilisation for subsequent reads within the time span of it being outside of ram but inside the SSD containment amount

I get the point from your request but the Array was designed to be a slower archival type of store and the Cache is actually where your data is located that needs to be accessed/changed quickly every day.

 

23 minutes ago, mrpops2ko said:

an example of a similar windows based tiered storage approach would be primo cache

Primo Cache is actually a simmilar soultion but also not, since Primo cache has algorythms in place that will also write the data (actually blocks) back to the fast storage type if you access it often and will also delete old blocks based on how often they where accessed.

 

AFAIK only random I/O will benefit from bcache because it was intentinally optimized for SSDs and senqential I/O won’t be even cached in bcache.

 

Please correct me if I‘m wrong about that…

 

Of course in certain use cases this can make sense but IMHO for Unraid that doesn‘t male a lot of sense.

Link to comment

it doesn't make sense to you to speed up the array storage?

 

at this point i'd just be writing again what i've previously written. do you genuinely not see the point?

imagine for example you have small amounts of ram spare and listen exclusively to flac songs and have a random playlist. (i'm making this example up so you get the point) each song you listen to is 30 mb, and you have enough ram spare for 10 songs cached. 

with an ssd read cache as described above, subsequent random replays (if landing on those songs already listened to) would then be serviced by the ssd rather than the array, freeing up that specific array disk for other tasks and potentially reducing i/o wait times (if in this scenario the disk is being thrashed for some reason, copying a large file for example)

like you mentioned, many large and cheap flash storage options are available but spinning rust will always be cheaper - having say a 20tb array, with 8gb of ram and a 2tb b-cache that read caches any accessed data from the 20tb sounds like a desirable thing to me. the major reason for, and major reason why the current system is not good, is that its impossible to predict what of that 20tb array is going to be accessed beforehand. 

 

again im not sure why im making up stupid justifications for this, but this example outlined doesn't sound far-fetched to me. you are sat at the dinner table with your family and let them know of a movie you recently watched on your family plex server. the family are rapturous as you regale them on what a great movie it is, after eating they scurry to their bedrooms in order to watch it.

 

In the current system (assuming the example above with highly limited ram, its 80% utilised on docker containers, so a mere 1.6gb free) then this poor array hard drive would be thrashed to shit because its now serving 3 or more people the same file. in a b-cache enabled environment, then no problem that ssd will chew through their requests like it was nothing (leaving your array sound and ready to serve up whatever client wishes to access data from it)

 

 

Link to comment
1 hour ago, ich777 said:

I get the point from your request but the Array was designed to be a slower archival type of store and the Cache is actually where your data is located that needs to be accessed/changed quickly every day.

 

Primo Cache is actually a simmilar soultion but also not, since Primo cache has algorythms in place that will also write the data (actually blocks) back to the fast storage type if you access it often and will also delete old blocks based on how often they where accessed.

 

AFAIK only random I/O will benefit from bcache because it was intentinally optimized for SSDs and senqential I/O won’t be even cached in bcache.

 

Please correct me if I‘m wrong about that…

 

Of course in certain use cases this can make sense but IMHO for Unraid that doesn‘t male a lot of sense.

 

oh also re: sequential it appears you can turn it off

 

echo 0 > /sys/block/bcache0/bcache/sequential_cutoff

 

Link to comment
11 hours ago, mrpops2ko said:

at this point i'd just be writing again what i've previously written. do you genuinely not see the point?

11 hours ago, mrpops2ko said:

again im not sure why im making up stupid justifications for this

First of all I have to say that I just wanted to understand your use case and that this should be a friendly conversation and I've never said that I'm completely against it...

 

I will look into this and try it on my own but I have to say, bcache wasn't designed for both of your described use cases by default and is also on the FAQ mentioned over here: Click

Quote

for most people and most drive access patterns that's a massive speed boost right there (exceptions would include streaming large unfragmented files or drives that are very heavy on writes and light on reads).

 

 

11 hours ago, mrpops2ko said:

oh also re: sequential it appears you can turn it off

That's for sure a thing you can do but then you basically cache every file on the Cache and you will reduce the life span of your SSD significantly since by default only files or better speaking IO that is smaller than 4MB is cached.

 

The next thing is that bcache needs to modify the superblock from the backing device and this can also lead to issues on the Array.

 

But as said, I'm not completely against it and I will look into it, try it on my test machine and report back, give me a few days, will be somewhere next week.

  • Like 1
  • Thanks 1
Link to comment
2 hours ago, ich777 said:

First of all I have to say that I just wanted to understand your use case and that this should be a friendly conversation and I've never said that I'm completely against it...

 

I will look into this and try it on my own but I have to say, bcache wasn't designed for both of your described use cases by default and is also on the FAQ mentioned over here: Click

 

 

That's for sure a thing you can do but then you basically cache every file on the Cache and you will reduce the life span of your SSD significantly since by default only files or better speaking IO that is smaller than 4MB is cached.

 

The next thing is that bcache needs to modify the superblock from the backing device and this can also lead to issues on the Array.

 

But as said, I'm not completely against it and I will look into it, try it on my test machine and report back, give me a few days, will be somewhere next week.

sorry if i came off as combative, it wasn't my intent but its been kinda my observation that sometimes discussions are wholly shut down because its not within the extremely narrow scope of an intended persons use case scenario (see for example the whole political ideological warfare landscape that is the linux kernel peeps dislike for FUSE file systems lol - the windows equivalent, using drivepool is significantly better on a lot of fronts but it just cant do hardlinks) it happens to all of us, i get it, some of us couldn't possibly imagine a scenario where its useful and on face value it sounds absolutely daft (not saying you did any of this, just talking about observations, i've also seen a lot of plonkers advocating for stupid stuff too and when drilled down it was stupid overall)

 

the conversation then devolves into an exercise where (rightly or wrongly) you end up having to extremely justify the minutiae of your deployment - writing literal novels in the process... (again im not saying you did any of this, i'm speaking in generalities of online discussions) 

also yep thats exactly as you described and its the absolute exact desired state in a tiered storage scenario 'basically cache every file' - thats the absolute bottom dollar god tier 10/10, dialled to 11 position that me and probably a bunch more people want to end up in

yep it will significantly reduce your SSD lifespan, its going to do some insane level of write amplification - thats also entirely desired
SSDs themselves have insane levels of longevity - if we are talking meta commentary on SSD lifespan we are seeing many SSDs outliving their usefulness because of sheer capacity expansion - SSDs basically just dont die now. at least the good ones. They are not only rated for significant amounts of writes, but those stated values are not reflective of REAL WORLD values

for example the samsung 840 pro (dont buy this btw for anybody reading, it has a hardware level defective which samsung 'fixed' in firmware... the 'fix' caused massive write amplfication), its got a rated TBW of some 100-200 TB? (i'm working from memory) but in torture tests where people actually seek to KILL the SSD, its survived for many PETABYTES of data. More than a 10-20x increased vs rated on the box. 

https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/

Quote

The SSD Endurance Experiment represents the longest test TR has ever conducted. It’s been a lot of work, but the results have also been gratifying. Over the past 18 months, we’ve watched modern SSDs easily write far more data than most consumers will ever need. Errors didn’t strike the Samsung 840 Series until after 300TB of writes, and it took over 700TB to induce the first failures. The fact that the 840 Pro exceeded 2.4PB is nothing short of amazing, even if that achievement is also kind of academic.

 

So yeah I think overall GOOD justification can be made for this kind of stuff. The used SSD market on both NVME and regular SSDs are a gold mine imo. The whole idea with the b-cache is that its interchangeable, ephemeral and you shouldn't give a shit about it - i'd adopt a pump and dump mentality with it

33fa96d73e44098757096228be5d4119.png

 

I envisage for unraid with a plugin / script, once sufficiently advanced it could be as simple as pointing towards an ssd and clicking go. from a value proposition I think this makes TONS of sense. 

 

Imagine someone is coming to you and says 'hey {user} for £16 the next 2 PETABYTES of reads are going to enter into a 128gb cache, which is going to be fast as lightning. Would you be interested in such a deal?'

I'd bite their hand off and thank them profusely for giving me such an opportunity. Probably ask if I could rake their yard in gratitude. 


 

 

Link to comment
On 9/30/2022 at 3:05 PM, mrpops2ko said:

I'd bite their hand off and thank them profusely for giving me such an opportunity. Probably ask if I could rake their yard in gratitude. 

12 hours ago, JDhillon said:

+ 1, would love a more sophisticated caching mechanism for unraid

I've now compiled bcache-tools and the bcache Kernel module to test this.

 

bcache is not suitable for Unraid since you always have to mount /dev/bcacheX and this is simply not possible on Unraid, at least not with workarounds and a lot of tinkering...

 

Anyways, I've tested it with a device outside the Array, mounted it manually and I have to say, it works.

It is of course pretty cool but as said only outside of the Array or Cache pool.

  • Thanks 1
Link to comment
On 10/4/2022 at 1:45 PM, ich777 said:

I've now compiled bcache-tools and the bcache Kernel module to test this.

 

bcache is not suitable for Unraid since you always have to mount /dev/bcacheX and this is simply not possible on Unraid, at least not with workarounds and a lot of tinkering...

 

Anyways, I've tested it with a device outside the Array, mounted it manually and I have to say, it works.

It is of course pretty cool but as said only outside of the Array or Cache pool.

great to hear but can you expand more on this? 

from what you've said it wouldn't be capable of caching data accessed on the array? if it isn't then can you elaborate on what use case scenarios we could do with this? 

 

I remember reading about using the bcache offset to simply mount with or without the bcache

d58215402e2b35711d84b39acb051b40.png

 

wouldn't that be something unraid could do for the array? like on a parity level unraid could just ignore the first 8kb of each disk (if the cache is enabled, this would be the bcache hook in offset) and then we'd all have a transparent, interchangeable, ephemeral cache

 

I recall reading @limetech played about with it in 2015, if we could get the 8kb offset hook in and the parity disk ignore the first 8kb (or assume its values since it'll never really matter) then that should be sufficient for array integration wouldn't it?

 

Link to comment
7 hours ago, mrpops2ko said:

from what you've said it wouldn't be capable of caching data accessed on the array? if it isn't then can you elaborate on what use case scenarios we could do with this? 

You can format, create, mount,... (everything that involves creating a bcache) outside the Array or Cache pool and use it there like it is currently the case for the ZFS plugin (everything needs to be done manually.

 

7 hours ago, mrpops2ko said:

2015

However as it is pointed out in the linked post not much development is going on there over on bcache, especially if you look only at the issue tracker or even the PRs over on their GitHub page for the bcache-tools and it is from my perspective dead in the water currently (last commit was back in 2014 and there are issues and PRs open dating back to late 2013).

 

I also saw this message a bit too often for my liking when creating a bcache (one time is already too much for me):

Segfault

 

Besides that that I also had to patch bcache-tools because it is simply out of date and wouldn't even compile on modern systems like it is currently available on GitHub.

Link to comment
9 minutes ago, tjb_altf4 said:

Not sure if that really changes some of the challenges you've noted, but might help stability

In terms of stability maybe.

 

From my perspective it is too complicated to set up on Unraid itself if you have to do it manually.

It also has a lot of downsides on Unraid because if a user decides to enable writeback mode, what most users would definitely do to speed up writes to the Array regardless of the downsides, you can definitely loose data and also maybe destroy the validity of your parity.

 

I completely get the point of adding such a read cache but from my perspective bcache is the wrong tool for this specific use case here and was never, at least not by intention, designed for such tasks.

  • Like 1
Link to comment
2 hours ago, ich777 said:

I completely get the point of adding such a read cache but from my perspective bcache is the wrong tool for this specific use case here and was never, at least not by intention, designed for such tasks.

what would you suggest is the best method to accomplish this on unraid as a transparent read cache for the array? pickings seem pretty slim and bcache seems the best solution for the job all things considered

 

as it stands it is as you have said, but i dont think it'd be reinventing the wheel to have a native implementation of bcache into the unraid ecosystem for the array (and maybe even cache devices too but that seems a bit redundant) 

 

the offset functionality is how we maintain each individual array disks accessibility, as independent filesystems - is it not?

the parity could have an offset too - hell if we moved this from an 8kb to 1mb offset instead... (so we can better support that alignment fix) is it really the end of the world if the first 1mb of our array just isn't backed up by parity? we could have each drive just start writing from 1mb in and in principle its all interchangeable plug and play?

Link to comment
13 minutes ago, mrpops2ko said:

what would you suggest is the best method to accomplish this on unraid as a transparent read cache for the array? pickings seem pretty slim and bcache seems the best solution for the job all things considered

I have no real recommendation (please read until the end) because I haven't got the need for such a cache yet. No issue over here even if four people streaming 4k content from the same hard disk and copying to it at the same time but as said above I completely get the point why it is requested here.

Also it seems that bcache would one of the best solutions for this but from my perspective writeback mode needs to be completely disabled, even if the user tries to manually enable it, but then it isn't really bcache anymore...

 

18 minutes ago, mrpops2ko said:

(and maybe even cache devices too but that seems a bit redundant) 

This would be a bit too much, at least from my perspective.

Most people over here have already a SSD or NVME as their cache drive(s) and caching a Unraid Cache Pool is really inefficient because caching a file that gets maybe moved by the Mover is most of the times not necessary (95% of the times).

 

23 minutes ago, mrpops2ko said:

the offset functionality is how we maintain each individual array disks accessibility, as independent filesystems - is it not?

the parity could have an offset too - hell if we moved this from an 8kb to 1mb offset instead... (so we can better support that alignment fix) is it really the end of the world if the first 1mb of our array just isn't backed up by parity? we could have each drive just start writing from 1mb in and in principle its all interchangeable plug and play?

I can't help with this because these are things that need to be changed in Unraid itself and there is nothing that I can do about. As said a few times above I'm also the wrong person for bcache because in my case the files that would got cached are cached for nothing about 90% of the time (but don't get me wrong I'm not against it but I wouldn't use it).

 

May I ask why not try ZFS with a L2ARC or similar?

ZFS is on the Horizon for Unraid, why not try the plugin for now and see if this is an alternative, although I'm not a big fan because ZFS needs to be maintained and actively monitored otherwise it can bite you back, really hard...

  • Like 1
Link to comment
2 minutes ago, ich777 said:

May I ask why not try ZFS with a L2ARC or similar?

 

because its ZFS which means I have to incur stripping. Most users who are attracted to unraid or similar solutions do so because of the ease of expansion and JBOD nature of things. 
When you get data at scale (and my scale isn't even that large, only some 80tb or so) the idea of losing it ALL in one singular failure is scary as hell. I love that its all JBOD and independent. 
 

Prior to using unraid, I had been using snapraid for about 7 years. I'm still mulling over the idea of making a guide on how to set up snapraid on unraid because its a significantly better solution all round for integrity (it checksums all the files and keeps records of them, and its very easy to see if theres a parity sync mismatch which files are mismatched so you can independently verify if the parity or the filesystem is correct) - its also very easy to plug and play it. The only major downside to it, is that its not a live parity (its snapshot based so you'd generally execute a script which will tell the array to sync and you can't be writing data to it during that time).

 

re: writeback mode I agree with you, it should be disabled and native unraid solution could lock that flag in place. (as well as potentially exposing a bunch of these options like sequential_cutoff as a GUI field / drop down menu

Due to the nature of unraid's cache system, writeback mode is largely redundant isn't it? I also agree it makes no sense to bcache a cache pool outside of some super duper niche and probably datacentre level considerations - nothing for us home users

 

  • Like 1
Link to comment
  • 7 months later...
  • 2 months later...
  • 2 weeks later...
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.