To Cache drive or not to Cache drive?


Recommended Posts

I have a 500gb 7200 rpm IDE drive sitting around that seems like it might be useful as a cache drive. I am running over gigabit ethernet, would I see a performance hit writing to the server using an IDE drive or is the drive still likely to be faster than the network?

Link to comment
  • Replies 366
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Jason & Others;

 

unRAID will automatically look after your files as they go to the cache disk and then get tranferred to the array disks. You set each share to use or not use the cache disk. You set the schedule. The, data copied to the shares will first go to the cache and then be moved to the array disks when the mover runs. You do not have to manually copy to the cache disk and you do not have to manually handle the transfers from the cache disk to the array disks. If the cache disk is full then data goes directly to the array until the mover runs and empties it again.

 

To many people are over-thinking the use of the cache disk and trying to manually manage it. You shouldn't even have to pay any attention to it or look at the files on it.

 

Peter

 

Link to comment

I'm using the cache disk only for transmission.

i I want to put files on the server i always choose disk shares too copy straight to the selected drive.

using the cache was the easiest option to use an drive outside the array.

 

I would write to the share which would utilise the cache for a couple of reasons:

 

1) It might prevent additional disk/s from spinning up (unless the disk/s you're writing to are also being seeded from)

2) It prevents some fragmentation of downloaded files as the full contiguous files are written to your disks each night.

 

But to each their own!

Link to comment

I'm using the cache disk only for transmission.

i I want to put files on the server i always choose disk shares too copy straight to the selected drive.

using the cache was the easiest option to use an drive outside the array.

 

I would write to the share which would utilise the cache for a couple of reasons:

 

1) It might prevent additional disk/s from spinning up (unless the disk/s you're writing to are also being seeded from)

2) It prevents some fragmentation of downloaded files as the full contiguous files are written to your disks each night.

 

But to each their own!

I'm not sure I get what you mean

 

all torrents are on the cache drive in the folder that starts with a "." so it will be invisible for the mover script.

all files will remain on the cache until the are manually moved to the disk of choice

so there is no fragmentation when the files are moved

 

only disk that is active 24/7 is the cache drive.

 

Personally i see no point in using a drive inside the protected array for torrents.

torrent already have protection -> verify  ;D

 

 

Link to comment

Personally i see no point in using a drive inside the protected array for torrents.

torrent already have protection -> verify   ;D

 

This is a completely different kind of protection. The array does nothing like "verify". It protects only from a total disk failure. There is no individual file protection or "verify". This can be added with the md5deep package.

 

If the cache drive fails everything on it is lost, but you can just download it again. If a file within unRAID becomes corrupt there is no way to fix it. It will fail when accessed.

Link to comment

that is very  true.

for critical data its always best to have (offsite) backups.

apart from knowing if  a files is corrupted md5 doesn't really protect does it?  creating parity (.par) files can to a certain extent. (just seems like a lot of work)

 

Having the entire array active to seed some of the files is for various reasons no option for me.

 

as you said if the drive fails most files can be downloaded again

 

 

 

Link to comment

that is very  true.

for critical data its always best to have (offsite) backups.

apart from knowing if  a files is corrupted md5 doesn't really protect does it?  creating parity (.par) files can to a certain extent. (just seems like a lot of work)

 

Having the entire array active to seed some of the files is for various reasons no option for me.

 

as you said if the drive fails most files can be downloaded again

 

 

You're right. I use md5deep in case I ever get a parity error. Then I can determine if the error is in parity or on disk. If none of my data files have errors then I update parity. If I have corrupt data files then I can rebuild that disk.

Link to comment

You're right. I use md5deep in case I ever get a parity error. Then I can determine if the error is in parity or on disk. If none of my data files have errors then I update parity. If I have corrupt data files then I can rebuild that disk.

 

How is this process configured? Do you have a cron job to scan all of your data drives and calculate md5 of each file (if it's been changed or wasn't already calculated)? I'd like to do something like this, too.

Link to comment

A parity error could be the result of a corrupt data disk or corrupt parity. It's most likely parity but this is not certain.

 

I have just done it manually and saved a copies of the hashes for all data disks in a .hashes directory on the top level of all drives. For my full drives it never has to change. I occasionally update the hashes for a drive as it fills. A cron job is a good idea though. It doesn't take very long to compute for for an entire 2T but so a nightly job for all non-full data drives should work.

 

Since I'm filling my media drives one at a time I only have a single drive to compute. I don't worry about disks that hold backups because if I have a parity error and my media drives are ok then I can just delete the backups and recompute parity. Backups are easy to replace.

Link to comment

A parity error could be the result of a corrupt data disk or corrupt parity. It's most likely parity but this is not certain.

True, or neither if the bit was corrupted in memory.    It is one chance out of N (where N = the number of total drives in your system + the number of other hardware items involved.)  It could be ANY disk, or any part of the I/O hardware, from memory to motherboard chipset.

Link to comment

So how does the md5deep package work? It creates one file with a hash in it for each file? Or does it do a hash => value scheme, where it has one file with each line containing the file name and the md5 value of it? Just want to figure out the best way to do this and let it be as automated as possible..

Link to comment

I'm confused about the diffrent drives that have been talked about. I currently am using 2 WD 1TB greens for data and a 1TB WD blue for parity.  These are all SATA drives and run at 3gps. I now have the pro lisence so am thinking of adding a cache drive. The WD blue is capable of 6gps same as the WD black, BUT only if it's connected to the new SATA 3 ports. Since these were not avalible untill now, how can a WD black be any faster my greens or blue which all run at 3gps runing on SATA 2 ports? I do understand that the black has a 64 meg cache, the greens have a 32 and i think the blue is 16

Link to comment

I'm confused about the diffrent drives that have been talked about. I currently am using 2 WD 1TB greens for data and a 1TB WD blue for parity.  These are all SATA drives and run at 3gps. I now have the pro lisence so am thinking of adding a cache drive. The WD blue is capable of 6gps same as the WD black, BUT only if it's connected to the new SATA 3 ports. Since these were not avalible untill now, how can a WD black be any faster my greens or blue which all run at 3gps runing on SATA 2 ports? I do understand that the black has a 64 meg cache, the greens have a 32 and i think the blue is 16

Don't get sucked in by the marketing.  Today's SATA disks, regardless of who that are made by, can basically attain a sustained max read speed of between 120 and 150 MB/s.  (multiply by 8 to get bps)

 

150 * 8 = 1200 Mbps, or 1.2Gbps. 

 

It does not make a bit of difference if the SATA link to it can theoretically transfer bits faster, it is not going to happen.  It has been frequently said, a spinning disk can barely saturate an SATA-1 link to it.

 

The BIGGEST factor for any disk is the areal density of the bits on the platters and the rotational speed of the platters.  The cache on the disk is nearly useless when playing music, or movies.  (When was the last you watched a movie that was less than 64 Meg in size?)

 

Joe L.

Link to comment

Well, you should see a performance increase, as I believe your other drives are 5900 RPM, correct? Why do you think you'd see a performance "hit"?

 

I have a 500 GB 7200 RPM Hitachi cache drive, with my array drives all being green--of mixed Hitachi and WD sizes. I never did a benchmark comparison between the two, but, according to those specs, performance should be increased.  Please I use the cache drive for running YAMJ.

Link to comment

Depends on usage. For sequential read or write access, a slower/ more dense surface can yield better performance than a higher RPM disk with lower density.

 

But for random access, the higher RPM drive can perform better due to faster access time.

 

Due to the way unRaid writes to the array, a higher RPM drive will provide better performance that a higher density slower RPM drive for array disks.

Link to comment

Well, you should see a performance increase, as I believe your other drives are 5900 RPM, correct? Why do you think you'd see a performance "hit"?

 

I have a 500 GB 7200 RPM Hitachi cache drive, with my array drives all being green--of mixed Hitachi and WD sizes. I never did a benchmark comparison between the two, but, according to those specs, performance should be increased.  Please I use the cache drive for running YAMJ.

 

Correct my data drives are 5900 RPM, My concern about a performance hit is because the 500GB drive  I am thinking about using as a cache drive is IDE interface not SATA

Link to comment

Well, you should see a performance increase, as I believe your other drives are 5900 RPM, correct? Why do you think you'd see a performance "hit"?

 

I have a 500 GB 7200 RPM Hitachi cache drive, with my array drives all being green--of mixed Hitachi and WD sizes. I never did a benchmark comparison between the two, but, according to those specs, performance should be increased.  Please I use the cache drive for running YAMJ.

 

Correct my data drives are 5900 RPM, My concern about a performance hit is because the 500GB drive  I am thinking about using as a cache drive is IDE interface not SATA

 

Do you expect to put more than 500GB worth of a data on your server per day? I don't see why that'd be an issue--actually a smaller cache drive would be better. Use a cache drive that's as small as the daily amount of data you'd transfer, plus the size of whatever Applications will permanently reside on your cache drive...

 

And yes, you should get a SATA interface cache drive...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.