Some powerful and or useful find commands; As with anything on the web, test before running ! (I forget the syntax so this is mainly self-documenting)


 

 

Find all *.tsv larger 1Mb, compress them with the super fast lz4 on high compression and remove the source file after this.

find . -name "*.tsv" -size +1M -exec lz4 -9 --rm '{}' \;

In the same line; compress all files ending with *.fastq and gzip them, also they cannot end with *.gz (in this case redundant but its an extra safety)

find . -type f -name "*.fastq" ! -name '*.gz' -exec gzip "{}" \;

 

Recursive remove all directory’s matching the name *.tsv.index in a rm or echo single command. This makes it possible to easily swap out rm for echo as a test.

find . -type d -name "*.tsv.index" -exec echo {} +
find . -type d -name "*.tsv.index" -exec rm -rf {} +

A combination of a few commands, calculate the storage use from all files size larger then 1M, with no hardlinks, ending with *.tsv.

find . -name "*.tsv" -size +1M -links 1 -print0 | du -hc | tail -n 1

(edit: might not work as intended)

find . -name "log_jobs" -exec du -hc {} +

 

Find files, that are newer then 5 minutes :

find . -type f -mmin -5

and older :

find . -type f -mmin +5

Hard links are nice, but also a (enter curse-word) to track, luckily we have find to locate it :

find /data -samefile file.txt -xdev

This would find all the files that are exactly the same as file.txt (so only hard links, no soft links or copy’s) considering hard links can only be in one file system its logical to add -xdev which tells find not to enter other file-systems since hard links can not be across file-systems. If you are also looking for soft links remove -xdev and add -L

Generate a md5sum for every file in this current directory except files “mylog.log” and “md5.lst”.

find . -type f ! -name "mylog.log" ! -name "md5.lst" -exec md5sum "{}" + > md5.lst

 

A quick and dirty way to find directories (=experiments) that have been made in the last 90 days, sorted on date (removing hard linked .save dirs) This is a sort.

find . -maxdepth 1 -name "*_machine_ID_*" -type d -ctime -90 | grep -v .save | sort -t_ -k 2

 

Ignore certain files, can be done using ! -name “*file” for example. This finds all directories starting with 17, and not ending with .save (hard link for us) and shows the size of those directories.

find . -maxdepth 1 -name "17*" ! -name "*.save" -type d -exec du -hs '{}' +

 

Count certain file type in a single directory (not recursively)

find . -maxdepth 1 -name "*.fastq" | wc -l