Checksum Suite


Squid

Recommended Posts

THIS PLUGIN IS DEPRECATED

 

 

Squid's Checksum Creator

 

A new tool to automatically (and manually if so desired) to create md5 / sha1 / sha256 / blake2s checksums for user selected shares.  IE: To verify the integrity of the data store within your shares.  (This is a more "old school" / traditional way to generate checksums vs the approach that bunker utilizes, along with offering a GUI for control, and is purposely designed with a limited feature set to be as user friendly as possible.  If you require the advanced filtering controls that bunker offers (eg: by file size, etc) install that script)

 

Paste this link into the Install Plugin Text Box:

 

https://raw.githubusercontent.com/Squidly271/checksum/master/plugins/checksum.plg

 

This plugin is only compatible wth 6.1+ (will not work at all under 6.0 due to design) and also requires the Nerd Pack Plugin (specifically inotify tools).  If inotify tools is not installed, the plugin will complain and direct you to the forum thread for Nerd Pack.

 

The plugin is specifically designed to output its data files to be compatible with corz checksum for windows (http://corz.org/windows/software/checksum/), although its generated files are compatible with md5sum, sha1sum, sha256sum, etc.

 

Description of Settings:

 

Share Settings

http://i46.photobucket.com/albums/f109/squidaz/Untitled_zpssp76gt4y.png

Share To Monitor  A drop down list of unRaid's user shares.  Note that appdata if it exists will not be displayed (to prevent constant scanning and creation of this share.  If you set the share to Custom, then you can manually enter a path below.  Note that "/" and "/mnt" are not allowed paths under any circumstances.  This is because using /mnt would for instances scan every folder a minimum of 3 times, and a maximum of # of drives present + 2.  Just a waste of resources.  Note that for monitored shares, only the folder within the share that has changes is recalculated.

Algorithm Either md5, sha1, sha256, or blake2 (actually blake2s for compatibility with corz).  md5 is arguably the most portable of the bunch.  Speed wise for calculations, there is very little difference between them.  Note: if you will be using corz to check any file in the future, do NOT use sha256.  Corz does't support it.

Update Changed Files  If when scanning the folder, the plugin detects that a file has changed (based upon the last scan time and the current file modification time), you have the option of either updating the checksum for the file or not.

Single Checksum file per folderIf yes, then a single file containing all of the generated checksums within that folder will be generated.  If no, a separate checksum file for each file will be generated

Monitor Sets whether the share is automatically monitored for changes to it.

Extension for checksum fileEither .hash or the .$algorithmUsed.  If using corz checksum, you'd be better off with .hash since its already associated with corz.  If not, it really doesn't matter which you choose

Include all filesScan all files within the folder for new additions or changes.  If set to no, then the Included / Excluded files  are available.

Included / Excluded FilesWildcards for files to include / exclude separated by a space EG: *.mkv *.avi *.mp3  etc.

 

 

Global Settings

 

http://i46.photobucket.com/albums/f109/squidaz/Untitled_zpsmvnqjpdi.png

 

Pause during parity check / rebuild  Self explanatory.  Will pause the current job after the current file is finished (see Queue below)  if it detects a parity check / rebuild is in progress.  After the parity check is finished, the job queue will continue (within 10 minutes).  Note: the current job in the queue must be completed before any changes to this setting take effect (IE: if its already paused for a parity check, you can't unpause it unless you stop the parity check)

 

Pause before calculating  This setting will pause a queued job before beginning for the selected amount of time.  This is useful for waiting for the current writes to a new folder to finish before beginning to calculate the checksum's for it.  It is also useful for cutting and pasting within windows to avoid having the plugin recalculate the checksums of the cut folder before its actually deleted.  Changes to this setting take effect after the current job in the queue is completed.

 

 

Apply  The apply button operates a little out of the ordinary here.  Each section (ie: global settings / each share section) has its own apply button.  Any changes to that section requires that you hit the apply button for that section.  Still working on alternatives here.

 

Add To Queue  (See queue below)  Manually adds the selected share to the job queue.  Note that without preexisting checksum files, only new / changed folders will be scanned.  (may be useful for some).  To generate checksum files for all of your existing files, you need to add it to the queue.

 

Monitor Control

http://i46.photobucket.com/albums/f109/squidaz/Untitled_zpssjxvlwq8.png

After certain changes to share settings (monitor or not, and the share monitored - the plugin will tell you), the background process must be started over.  Note:  during this process, any jobs already queued will be lost (ie: wait for the status to say idle), and any changes that happen during the process will not be handled.  Also, depending upon the complexity of your folder structure, it may take a minute or so.  The on screen log display will show a message that its generating watches.  After its completed, it will state Watches Established  At that point the system is back up and running.

 

Queue

 

Because this plugin is designed to automatically monitor for changes to folders, everything operates under a FIFO queue system.  (Otherwise you could potentially have a couple of hundred of instances of checksum generation running concurrently if you had Plex save the .nfo files to the array).

 

The queue works like this:  The monitoring script sends a message to the checksum generation.  If the time between the message being sent and the job being executed is less than the Pause before calculation setting, then the script will pause for the appropriate time.  In English:  Assuming that the pause time is set to 10minutes:  If at 1pm a job is started, that job will not start until 1:10pm.  Let's say that the job took 3 minutes to run.  But, if at 1:01pm another job was sent, then at the completion of the first job, it will only have to pause 8 minutes prior to it beginning.

 

Every change to any monitored folder is classed as a job.  The net result is that if you are updating tons of folders concurrently (eg: Plex writing out its .nfo files), is that the total delay is only going to be 10 minutes.

 

Manually queuing up a folder is exempt from the pause (ie: if the plugin is idle for generating any checksums, then it will run immediately)

 

Speed of Generation

On both my systems, this plugin can pretty much match the sustained read speed of the drives with any of the algorithms.  IE:  On average for my media files I get between 160MB/s to 195MB/s (for drives which are rated at 210MB/s max transfer rate).  The big upshot of monitoring changes as they happen is that the odds are still decent that unRaid may still have the freshly written file still cached in memory)  In my case, for freshly written media files less than around 5GB, I get checksums generated at ~ 480MB/s for a file which is technically stored on my ancient 5400rpm hard drive.  Your mileage will vary depending upon your hard drives, controllers, and available RAM.

 

Preexisting checksum files  If you already have preexisting files generated by corz or by another program, this plugin will read them to save time.  The preexisting filename must match the file that this plugin would have generated.  IE:  If in single checksum mode, the preexisting file should be either filename.ext.hash (or .md5 / .sha1 / etc) depending upon what your extension settings are.  In folder mode, the file name should be folder.hash (or .md5 / .sha1 / etc).  Side note:  In the course of generating this I discovered a bug with corz where the timestamp of the file does not always match the timestamp of the file.  Sometimes its out by exactly an hour up or down.  The timestamps that this plugin generates always matches the timestamp reported by windows and by unRaid.  Net result is that this plugin thinks the file has been modified since corz ran against it, and if set to update mode it will regenerate it.  Side note:  If someone really wants me to incorporate bunker support, send me a copy of the exported file, and I should be able to generate a script to pre-write the checksum files.

 

 

Coming within the next week:

 

cron jobs for each non-monitored share to be added to the queue.  Incorporate an apply-all button.  GUI for automatically checking a portion (or all) of the generated checksum files for consistency.  (cron job and percentage to check).  (And whatever else the user-base suggests that meets the design goals of simplicity to use and manage)

 

Technical Notes

 

In order to limit this plugin's impact upon the responsiveness of your system, the generation of checksum files runs at a low priority (nice=12).

The log file generated by this plugin is automatically capped at 500K, at which point it is deleted and started over.  If you have the separate log window open and it stops responding yet the log display within the plugin is still continuing, the log has restarted.  Close the window and re-open it.

 

Change Log

 

2015.10.31 - Revised GUI

2015.10.29 - Fix monitor not starting after a reboot

2015.10.24 - Add in manual verifications, revised logging

2015.10.18 - Add in inotifywait settings

2015.10.12 - Initial Release

  • Upvote 1
Link to comment

I like the way this sounds, but have a few questions ...

 

=>  It sounds like this is nearly 100% Corz compatible ... i.e. in "single checksum per folder" mode it's going to generate the same folder_name.hash checksum files that Corz generates.    Are these fully Corz-compliant?  i.e. if I point to the folder, right-click, and select "Verify checksums" with Corz is it going to verify okay?

 

=>  Is there a verification function?    Or is this just (as the name suggests) a checksum "creator" ?

 

=>  If the "Update changed files" choice is NO, do you get any notifications that there were changed files found?

 

=>  If this is pointed to a share, and every folder in the share already has a Corz-generated checksum, will it recognize this and simply "do nothing" ... i.e. the same way Corz will react if you select "Create checksums" and then choose "Synchronize".

 

 

With my usage pattern, the "Next" tool you're probably working on is what I'd be most interested in -- I presume that's going to be a "Checksum Validator"  :) :)      I already generate a folder .hash file with Corz for everything I add to my server ... but a validation utility that runs on the server should be appreciably faster than doing the verifications across the network.

 

 

 

Link to comment

I like the way this sounds, but have a few questions ...

 

=>  It sounds like this is nearly 100% Corz compatible ... i.e. in "single checksum per folder" mode it's going to generate the same folder_name.hash checksum files that Corz generates.    Are these fully Corz-compliant?  i.e. if I point to the folder, right-click, and select "Verify checksums" with Corz is it going to verify okay?

Yes.  They will verify ok using corz.  (or md5sum / sha1sum , etc)  Corz' format is merely a standard checksum format with comment lines in for time stamps. 

 

=>  Is there a verification function?    Or is this just (as the name suggests) a checksum "creator" ?

Within the week.  My initial impetus was to create the files locally, as I didn't like how corz would always change the modified time for each and every folder.  Verification however is really going to be limited to advanced things like say 10% of the hashes each cron job, or an entire share, etc.  For simple single file / folder checking, its easier to just use whatever checksum tool you're already used to (ie: corz.  no sense reinventing the wheel here)

=>  If the "Update changed files" choice is NO, do you get any notifications that there were changed files found?
Not at the moment, but not a big deal to add.  Like anything else in the world, this was made for how I use things, so I never considered it, but it is a good idea.

=>  If this is pointed to a share, and every folder in the share already has a Corz-generated checksum, will it recognize this and simply "do nothing" ... i.e. the same way Corz will react if you select "Create checksums" and then choose "Synchronize".
Yes.  With one caveat (and I've been talking with cor about this).  There is a bug in corz checksum related to daylight savings time where sometimes the time stamp that he writes is off by an hour.  (He just found this problem when he was creating his own automated checking feature).  The problem is that if his time stamp is previous to the actual time stamp on the file, there is no choice but to assume the file has been changed since it had its checksum done.  If his time stamp is after the file's time stamp, then its no problem - you just assume that the checksum is correct.

 

The time stamp that I write out into the hash file 100% matches what linux says is the modified time, 100% matches what windows says.  Corz is wrong in this situation.

 

The plg will also read a hash file without the comment lines (date stamps) (nb This is the standard checksum file format), assume that the checksum is correct and rewrite the files with appropriate time stamps.  This is also how it operates in single file mode.

 

With my usage pattern, the "Next" tool you're probably working on is what I'd be most interested in -- I presume that's going to be a "Checksum Validator"  :) :)      I already generate a folder .hash file with Corz for everything I add to my server ... but a validation utility that runs on the server should be appreciably faster than doing the verifications across the network.

 

Hugely faster.  Since corz is limited to doing things over the network, you're limited to say around 70MB/s.  Locally on (what I think is a pretty run of the mill server [my server-a], I get between 160-195MB/s for files that are already on the array.  But, since I'm also monitoring new writes and generating the hash files a couple of minutes after they get written, the odds are decent that the file is still cached in the server's memory.  In that circumstance, I get an amazing 490MB/s (for a file which is technically stored on an ancient 5400rpm seagate drive).  Of course, the file has to fit in memory to still be cached there - which in my case limits me to a 5-6Gig file size before the system actually has to read it from the drive.

 

Even my secondary server (server-b) with less ram, a slow processor, and and a slow BR10i controller averages between 105 -> 130MB/s on the slow drives and around 180MB/s on the fast ones

Any verification will give the same results

 

Link to comment

does not like spaces in share names

 

Monitoring /mnt/user/BD Movies

Setting up watches. Beware: since -r was given, this may take a while!

Couldn't watch /mnt/user/BD: No such file or directory

 

Myk

Ah ok... I don't have any shares with spaces, so didn't test for that.  There'll be an update tomorrow then.
Link to comment

Brilliant plugin Squid, will you update the option to actually generate the checksum of existing files and I have tons of existing files which still need the checksum generated?

Already there.  Run a manual scan (Add to queue).  Around a week will be able to also manually run full scans as a cron job
Link to comment

This, coupled with the forthcoming "Verify" plugin, is a GREAT addition to UnRAID.    I've often wished there was a built-in feature that would provide the same functionality as Corz's Checksum ... and this pair of plugins is going to do exactly that !!

 

Great work.

 

[Corz's utility works very nicely, but it requires a Windows box to run on and is notably slower when working across the network => these plugins will let it run natively on the server, which should be MUCH faster (2-3 times as fast)]

 

 

Link to comment

This, coupled with the forthcoming "Verify" plugin, is a GREAT addition to UnRAID.    I've often wished there was a built-in feature that would provide the same functionality as Corz's Checksum ... and this pair of plugins is going to do exactly that !!

 

Great work.

 

[Corz's utility works very nicely, but it requires a Windows box to run on and is notably slower when working across the network => these plugins will let it run natively on the server, which should be MUCH faster (2-3 times as fast)]

 

+1

 

I’ve also been using corz, very happy to be able to create/verify checksums without using a windows pc.

 

Many thanks!

 

Link to comment

 

Awesome! I use Corz to checksum everything on my computer before transferring to unraid (though I have it use .md5 not .hash,) BUT it takes a week++ to hash check everything over the network. The ability to have this hash check everything on unraid would dramaticly cut that time down -- sweet! :)

Link to comment

Agree the local verification is going to be very nice.

 

Squid:  When you do the validation plugin, please be sure you can choose to validate a specified DISK in addition to selecting by shares/folders/etc.    For a share that spans many disks (e.g. my media share) I'd prefer to validate one disk at a time, so multiple disks are spun up while it's traversing a share.  Even though I expect it to be much quicker than over the network, it's still likely to take a good while with over 4000 movies and 14 disks.

 

Link to comment

Updated to 2015.10.13

Relatively minor update.  Fixed the issue with monitoring shares if spaces were present.  Fixed issue with stopping for parity checks only happening at the beginning of a job, not after each file.  Added to "paranoia" code to stop execution if certain configuration values are corrupted on the flash drive.

 

 

Link to comment

 

Awesome! I use Corz to checksum everything on my computer before transferring to unraid (though I have it use .md5 not .hash,) BUT it takes a week++ to hash check everything over the network. The ability to have this hash check everything on unraid would dramaticly cut that time down -- sweet! :)

I think that I'll make another option in the plg to ignore time stamp differences on a preexisting hash file if they are out by exactly and hour (corz bug)

 

 

 

 

Link to comment
Oct 14 2015 11:47:43 Background Monitor Starting

Monitoring /mnt/user/Movies

Setting up watches. Beware: since -r was given, this may take a while!

Failed to watch /mnt/user/Movies; upper limit on inotify watches reached!

Please increase the amount of inotify watches allowed per user via `/proc/sys/fs/inotify/max_user_watches'.

 

i increased it to 728000 and it starts now but it is a bit annoying if we need to set this manually every time ?

this server powers down at 01 AM and powers automatically back on at 09.30 AM

 

it also seems to run as root and put the hashes in the files under root ....

so i don't think we can check hash from windows

 

Link to comment

I'm interested in this.

I'm looking for a possibility to detect silent file corruption.

 

I understand this plugin keeps the hashes up-to-date but then file corruption wouldn't be

noticed.

This means that you have to check the hash against a backup set?

 

What is the exact intention/usecase for this approach?

 

 

Link to comment

That is a very promising plugin and should be basic NAS functionality.

 

I saw some silent file corruption recently and would have the same question as fireball.

 

I was reading Corz' site (seems that this is THE checksum utility based on some comments here) and his tool is reporting missing/corrupt/changed files and is updating the .hash file accordingly. I do not understand how the tool is differentiating between "corrupt" and "changed" file, so maybe someone can explain?

 

Two more question to Squid:

1.) Will your Verification Tool also making the difference between missing/changed/corrupt/deleted files?

2.) Will the Creator Tool write one checksum file or many?

 

Thanks for the great plugin Squid!

Link to comment

...  I do not understand how the tool is differentiating between "corrupt" and "changed" file, so maybe someone can explain?

 

Corz' utility reports a file as being CHANGED if the date/time stamp for the file has changed and the MD5 doesn't match; or as CORRUPT if the date/time stamp has NOT changed but the MD5 validation fails.    Alternatively (it's an option) Checksum will automatically update the MD5 for changed files.

 

It looks like Squid's creation plugin pretty much mirrors Corz' Checksum behavior => i.e. you can choose whether or not to update hashes for changed files; and I assume the validation plugin is going to report any corrupt or changed files.

 

Link to comment

Oct 14 2015 11:47:43 Background Monitor Starting

Monitoring /mnt/user/Movies

Setting up watches. Beware: since -r was given, this may take a while!

Failed to watch /mnt/user/Movies; upper limit on inotify watches reached!

Please increase the amount of inotify watches allowed per user via `/proc/sys/fs/inotify/max_user_watches'.

 

i increased it to 728000 and it starts now but it is a bit annoying if we need to set this manually every time ?

this server powers down at 01 AM and powers automatically back on at 09.30 AM

Good point.  Shouldn't be a problem for me to modify that setting on initialization

it also seems to run as root and put the hashes in the files under root ....

so i don't think we can check hash from windows

Works for me.
Link to comment

I'm interested in this.

I'm looking for a possibility to detect silent file corruption.

 

I understand this plugin keeps the hashes up-to-date but then file corruption wouldn't be

noticed.

This means that you have to check the hash against a backup set?

 

What is the exact intention/usecase for this approach?

Yes its designed to keep the hashes up to date.  Nothing bothers me more than modifying a file, forgetting to do rehash it, and then a couple of months later run a check and it comes up as corrupt, and then trying to remember what I did to the file originally.

 

Hashes are only updated if you set the system to look for modified files, AND the file modification date has changed forward.  Silent corruption will not update the modification time for the file.  So no backup set is required.

Link to comment

...  I do not understand how the tool is differentiating between "corrupt" and "changed" file, so maybe someone can explain?

 

Corz' utility reports a file as being CHANGED if the date/time stamp for the file has changed and the MD5 doesn't match; or as CORRUPT if the date/time stamp has NOT changed but the MD5 validation fails.    Alternatively (it's an option) Checksum will automatically update the MD5 for changed files.

 

Actually, my copy of Corz doesn't report the file as being changed.  Only that the MD5 doesn't match.  But the verification tool thats in progress right now does report it.
Link to comment

 

2.) Will the Creator Tool write one checksum file or many?

Your choice.  Either a single checksum file for each file it checks, or a single checksum file for each folder it checks.  The single file for each folder is far neater when navigating through your files
Link to comment
  • Squid locked this topic
  • Squid unlocked this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.