Utility to show duplicate directories/files & create rsync cmd for drive cleanup


Recommended Posts

Thanks.  There are a couple of utilities being worked on in UnRaid

 

unBalance - http://lime-technology.com/forum/index.php?topic=39707.0

 

Consolidate: http://lime-technology.com/forum/index.php?topic=36201.0

 

For now I will stick with my excel utility where I have complete control as to which directories are moved and the associated rsync command to do the move.  Now that I stack the rsync commands, I can clean up a system very quickly.

Link to comment
  • 3 weeks later...

Hello,

 

Thank you for this tool, it's perfect for what I'm trying to achieve.

 

The only issue I'm experiencing is that it doesn't deal well with the space in my share name "multimedia/tv shows". I realise I can manually change the copied command by enclosing the entire mnt command in quotes, but I'd prefer to implement a proper fix, is this easily achieved?

 

Cheers,

 

Mark

 

P.S. Can I use this to move all my season folders to one drive, or is that out of scope?

Link to comment

@markswift

What version are you using.  You should be using Indexer v 04.07-02-02-01e.

 

What exactly is the issue you are having, I just tried a test folder "Test Share2" and "Test Share2\UNRAID Files".  Make sure you do not have any leading or trailing spaces in the share name.  Make sure you have spelled the share name correctly.  If change the spelling of the share, you have to re-pick it in the drop-down.  Excel does not dynamically update the data.

 

As far as consolidating directories, yes it can do that.  Did you read this post, link, and look at the screenshots I made with the annotations?  Hint, you use the "Rsync Command Build" to build an rsync command that you paste into a command window to execute the move.

 

You might want to read all of the posts in this thread, there is not very many.

 

Now that I reread your post it appears there is an issue with the rsync command that is built.  Let me look at it and see what I can do.  If I cant come up with a quick fix, I don't know that I will get it done any time soon.  I specifically don't use spaces for this very reason.  I also originally built it to work with top level shares.

 

Column E on the rsync tab is where the stacked command is built.

Link to comment

All, please do not use this version unless you need the specific case this version supports.  Use the "Indexer v 04.07-02-02-01e" version

 

 

@markswift

Give "Indexer v 04.07-02-02-01e-02_Test" a try.  You can get it here.

https://drive.google.com/folderview?id=0B8FJEotSXLrxQllJdTBrVmxNd2c&usp=sharing

 

This version wraps the rsync command in quotes like this:

rsync -avh  --remove-source-files "/mnt/disk5/Test Share2/UNRAID Files/couch - sab - other"  "/mnt/disk2/Test Share2/UNRAID Files"

 

The downside of doing this is that the "/mnt/disk2/Test Share2/" path must exist.  Rsync can't create it.  This is a hack to accommodate spaces in the path name.  There is probably a way to accommodate them and provide a better solution, but I don't know what it is.

 

This should work ok, but you may want to try it with some test files first.  If it works great, if it doesn't sorry about that, but I don't plan on making any additional changes.  The best solution would be to remove the spaces from your path names.  Use an underscore "_" in its place.

 

Use the command that is built in column "E".  Its stacks the rsync commands to move from the source drives to the destination drive picked in column "G"

 

It also appears the "delete empty folder" command does not work with spaces in in the path unless you wrap the command in quotes.  You probably don't want to use that command.

 

The code is unlocked, so feel free to modify it if you want.  The code that builds the rsync formula is:

"


  ActiveCell.FormulaR1C1 = _
    "=IF(OR(RC1="""",RC7=""""),"""",IF(RC1="""","""",IF(R8C[-1]=RC7,"""",IF(RC[-1]="" "","""",CONCATENATE(IF(Rsync_Model=""Rsync Copy"",Rsync_Production,Rsync_Dry_Run),R9C[-1],""/"",ShareToCheck_2,""/"",RC1,"""""""","" "",Mount,RC7,""/"",ShareToCheck_2,"""""""")))))"
    

 

You can find it in the Sub Insert_Rsync_Formulas.  It is kind of messy in there as I hacked it together as I went.  I am not a programmer and it shows.

Link to comment
  • 2 months later...

I’ve updated my spreadsheet, Indexer v 04.07-02-06, to display the Disk with largest quantity of data for each directory in the “Rsync Command” tab. using the attached jpg as a reference, you can see this data in column D.  Previously as you can see in the picture, you would make a decision as to which disk you wanted to consolidate the directory onto.  The majority of the time it is probably the disk with the largest quantity of existing data.  You would determine which drive it was then select the disk in the destination drive cell.  Now you can copy the all of the data in column D, starting on row 10, paste, using PASTE SPECIAL VALUES, it into the corresponding cells in column G.  Then use the filter in column F to only display values greater than 1.  This will then show you a list of the directories that you want to work with.

 

You can still select each disk individually if you want.  In the example you can see that I have selected disk2 and it will build the rsync command(s) to move the data from disks 3&4 to disk2.

This update is against the Indexer v 04.07-02-02-01e version.  I will not be updating the Indexer v 04.07-02-02-01e-02_Test version.

 

NOTE:

After you update the “Input Data” tab with your data, you MUST run the reports twice.  The first run the Rsync command will be off one row on the directory that is selected.  You can see this by assigning a disk to move the data.  Copy the rsync command and paste it into a test file.  You can see the actual command being built.  The second run will align the data.  Then save the file.  You only have to do this the first time you use the utility.  This occurs due to the manor that I clear my personal information before I upload the file.  Unfortunately there’s really nothing I can do to fix it.

 

File https://drive.google.com/file/d/0B8FJEotSXLrxcFptMmVCVTZYN1k/view?usp=sharing

 

Note: if you downloaded the -05 version of the file, you may want to replace it with the -06 version.  I forgot to add the code to write the header in column D.  This corrects that over-site.

MaxDisk.jpg.dff5a4a59bac790223b8b2863ececc85.jpg

Link to comment
  • 3 months later...
  • 1 month later...

On the Input Data tab, can you copy and paste cells B14-B39 so I can see how you have the inputs setup.

 

Also what version are you using?.  And what version of Excel?  Are you running a Windows version or the Mac version of Excel?

 

Have you modified or changed any sheets or cells other than  Input Data tab cells B14-B39?  If so, start with a clean copy of the file.  You can't really modify the file without breaking it.

 

Also, look at the disk1-24 tabs and see if there is anything listed in Columns A and B.

Link to comment

On the Input Data tab, can you copy and paste cells B14-B39 so I can see how you have the inputs setup.

Fji9l46.png

 

Also what version are you using?.  And what version of Excel?  Are you running a Windows version or the Mac version of Excel?

v 04.07-02-06

 

Excel 2013 on Windows 7

 

 

Have you modified or changed any sheets or cells other than  Input Data tab cells B14-B39?  If so, start with a clean copy of the file.  You can't really modify the file without breaking it.

I have not.

 

 

Also, look at the disk1-24 tabs and see if there is anything listed in Columns A and B.

All are blank.

 

Thank you!

Link to comment

I'll have to look more when I get home, but have you also exported your Disks?  I.e, can you go to these destinations from your PC:

 

From "My Computer" paste the address in and see if it can open the path.

 

\\172.16.15.114\TV

\\172.16.15.114\Disk1

\\172.16.15.114\Disk2 etc?

 

The fact that there is no data on the disk tabs indicates it is not retrieving the data.  Also, it is looking for a structure with sub directories under the main share, not just a bunch of files.  Depending on the report ran, it pulls from either user, ie the combined view or the individual disks or both.

 

\\172.16.15.114\TV\Castle

\\172.16.15.114\TV\Rome

\\172.16.15.114\TV\Bones

 

\\172.16.15.114\Disk1\TV\Castle

\\172.16.15.114\Disk2\TV\Rome

\\172.16.15.114\Disk3\TV\Bones

etc.

Link to comment

Also, it does not touch the cache drive to move files.  It leaves that up to "Mover".

 

Another thing, cache_dirs really helps it go faster.  If you don't have cache_dirs enabled, spin up your drives before you kick the script off.  I have seen it have issues if it has to wait for drives to spin up.

Link to comment

Run it with these options to if you are just using the rsync tab.  While the other data can be useful, it takes longer to run and is not required to fill out the rsync tab.  This utility has kind of grown organically in that I had a basic understanding of what I wanted but I changed it as I went.  As such there is other data than can be retrieved beyond the minimum required to to just build a rsync command.  Set B16 to No if you want to watch the process of the data being retrieved and used to get a feel for what is going on.  Also in the lower left corner of the screen are status messages.

 

 

B16 = Yes

B17 = Yes

B18 = Gigabytes

B19 = No

B20 = Yes

B21 = No

B22 = No

 

Also, make sure you run it twice the first time, then save the file.  Use the saved file going forward.  It will get off a row in the rsync command that is built if you run the macro against a directory that does not exist.  The second run gets everything in sync.  To clear my data before I post it, I run it against a non-existent directory.

Link to comment

It ended up working great, once I enabled the disk shares. Took me a moment to figure out where those were enabled. I've shifted a *lot* of files around, removing the duplicates, checking the split levels, balancing a couple of disks, and it all worked flawlessly. Thank you for making this and helping me out!

 

Peter

Link to comment

No problem.  Glad it worked for you.  I spent a lot of time developing it.  I personally like to maintain control as to exactly what is being moved where.  It is helpful in that also be used keeping your drives clean.

 

I know some people like to have every file associated with a directory on the same disk where others let them go where they want and let unRaid keep track of it all.  I'm in the first camp, I don't want my drives spinning up unless it is actually needed.  About once a month I fire it up and clean my drives up.

 

If you didn't do it, you will want to make sure you delete all the empty directories.  Even with no files in them, they are part of the directory and will cause the drive to spin up.  When you run a report on a share, if push the yellow "cp "dlt emt dir"" button, it will copy a stacked command to the clipboard you can paste in the telnet session that will delete all of the empty directories on all the drives in that particular share. For example if you had a share called "TV_Shows", this is the command that would be built.

 

find /mnt/disk1/TV_Shows/ -type d -empty -delete ; find /mnt/disk2/TV_Shows/ -type d -empty -delete ; find /mnt/disk3/TV_Shows/ -type d -empty -delete ; find /mnt/disk4/TV_Shows/ -type d -empty -delete ; find /mnt/disk5/TV_Shows/ -type d -empty -delete ; find /mnt/disk6/TV_Shows/ -type d -empty -delete ; find /mnt/disk7/TV_Shows/ -type d -empty -delete ; find /mnt/disk8/TV_Shows/ -type d -empty -delete ; find /mnt/disk9/TV_Shows/ -type d -empty -delete ; find /mnt/disk10/TV_Shows/ -type d -empty -delete ; find /mnt/disk11/TV_Shows/ -type d -empty -delete ; find /mnt/disk12/TV_Shows/ -type d -empty -delete ; find /mnt/disk13/TV_Shows/ -type d -empty -delete ; find /mnt/disk14/TV_Shows/ -type d -empty -delete ; find /mnt/disk15/TV_Shows/ -type d -empty -delete ; find /mnt/disk16/TV_Shows/ -type d -empty -delete ; find /mnt/disk17/TV_Shows/ -type d -empty -delete ; find /mnt/disk18/TV_Shows/ -type d -empty -delete ; find /mnt/disk19/TV_Shows/ -type d -empty -delete ; find /mnt/disk20/TV_Shows/ -type d -empty -delete ; find /mnt/disk21/TV_Shows/ -type d -empty -delete ; find /mnt/disk22/TV_Shows/ -type d -empty -delete ; find /mnt/disk23/TV_Shows/ -type d -empty -delete ; find /mnt/disk24/TV_Shows/ -type d -empty -delete

Link to comment
  • 4 years later...

Hi. I don't know whether you are still maintaining this but I have a question. Will it only report the folders that are split in different disks or all the folders anyway? The reason I'm asking is that I see folders in the report that according to it are seating only in one disk. If this is the case, then it does not run properly as I have some 3500 and it only reports around a hundred of them. Apart from that it seems that due to a security change in the recent unraid version the undist.sh script cannot be executed. Waiting for your feedback

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.