GNU find tips (again!)

I am reminded of one fact (by this post) about find that you may not be aware of (or have forgotten). The find command, by default, does not follow symbolic links: they are just another file for find to scan.

In order for a symbolic link to be recognized and used in its usual manner, the -follow option is needed.

This brings up another point about find that will clear up some confusion, even for experts. The find command mixes standard options (which change its overall behavior) with conditionals (picking and choosing files). Thus, some options can occur anywhere and change the operation of find as a whole; others are used in an implicit command line conditional. The man page calls these options “position-independent terms”.

So, even though a find option appears after the -a option (AND operator) it could actually be a true option to find, changing the command’s behavior (and thus not be relevant to the -a “option”).

The previously mentioned -follow option is a true option. However, options like -name, -a, -o, -size are all conditionals. If you keep this in mind, the occasional option in the middle of a conditional will not throw you.

The HP-UX man page for find lists the following position-independent terms:

  • -depth
  • -follow
  • -fsonly
  • -xdev

Let’s consider some more ideas on how we can use find to help us daily.

If this hasn’t been mentioned already, the -nouser and -nogroup options can be quite useful in a cron job for finding files that don’t have a proper user or group associated with them (these files are a security risk):

find / -nouser -o -nogroup -print

As we discussed earlier, using find in automated mode with an rm command is not a good idea – but at the command line, there is a replacement for -exec that will provide you with a safety net: -ok:

find . -name "f*" -ok rm -f {} \;

This command is functionally equivalent to:

find . -name "f*" | xargs rm -i

Both commands will prompt you as to whether you want to erase the file found or not.

The option -xdev is a favorite of mine: don’t follow any mount points. So if you want to search /tmp for various files but have things mounted in /tmp/mnt or other locations, this command will allow you to skip files under such mountpoints:

find /tmp -xdev -name "foo*"

This will find /tmp/foobar – but not /tmp/mnt/fooey.

An enhancement to the -exec command can improve performance – and reduce the need for xargs. Instead of terminating the command string with a semicolon (;), terminate it with a plus sign (+) right after the {} which signifies the location of the filenames in the command line:

find /tmp -name "f*" -exec ls -ld {}\+

The plus (+) is escaped, just like the semicolon would be, since the shell recognizes both characters as special characters; the backslash makes the shell treat them like any other character.

More tips on using find

My entry on using find turned out to be popular; I thought I’d throw out some tidbits on using find for various things. So sit back and enjoy the list!

find . -type f | xargs ls -ld

List all standard files in the current hierarchy starting at the working directory.

find . -type d | xargs ls -ld

Get a list of directories, showing the tree structure starting at the working directory.

find . -mtime +2

Show all files older than two days.

find . -size +10000

Show all large files (possibly for freeing up some space?)

find /*bin /usr/*bin /usr/local/*bin -name "somebin"

Find a executable file by the name given in any of the usual locations (including both bin and sbin).

find . -cpio dev

Find files (as specified) and write to a cpio archive specified in the cpio parameter. This is a newer option; check if your find has it first.

find / -nogroup -o -nouser

Find all files on the system that have no group or owner; files such as these are a security risk and should be associated with groups and owners in /etc/group and /etc/passwd. Remember to run a fidn command that starts at / at a time when users will not be inconvenienced by the massive search.

find / -perm -s

Find all files that are suid or guid – again, these may be a security risk. You should know which binaries are (and need to be) suid and guid.

On using the -exec parameter:
if you use the “old school” form, with {} and \; (such as find . -exec rm -f {}\;) then the command will be executed once for each file found. If you use the “new school” form with {} followed by \+ (such as find . -exec rm -f {}\+) then the command will be executed for all files found in one go.

It is unclear whether it will limit the number of parameters (to keep the command line to within acceptable lengths). The xargs command is the usual way to do this; the biggest advantage of xargs over -exec is that no matter how many times the command is executed, it is never pulled again into memory: xargs restarts the command from the beginning without paging in any code. It also allows you to specify the line size limit as well, or a limit on the number of arguments for that matter.

Newer xargs appear to work with shell scripts as well; traditionally, xargs required binaries and shell scripts (or other scripts) would not work.

Assorted Tips and Tricks

First, there is one that I learned recently myself. This trick is ingenious!

One of the most challenging things to explain is why this doesn’t work (when the outfile is write-restricted to root):

sudo command > outfile

This will fail because when the shell tries to open outfile, it is not running as root – and thus does not have access. Solving this problem is not simple because of the shell’s quoting mechanisms and when it opens (and doesn’t) the file in question.

However, there is a simple solution that I’d never considered before:

sudo command | sudo tee outfile

This takes care of all the problems involved – and if the command itself is not restricted to root, then the first sudo isn’t necessray either.

Another thing that can be seen often in shell scripts is something like the following:

cmd >> $logfile
cmd2 >> $logfile
cmd3 >> $logfile
print "New stuff...." >> $logfile

This entire section can be replaced like this:

( cmd
cmd2
cmd3
print "New stuff...." ) >> $logfile

In the first example, the $logfile is opened four times – and many situations would include many more than just that. The last only opens $logfile once.

Another tip – this time in the find command. A sequence like this:

find dir1 -mtime +1 -type f
find dir2 -mtime +1 -type f
find dir3 -mtime +1 -type f

…can be replaced by a much more succinct command, like so:

find dir1 dir2 dir3 -mtime +1 -type f

Thus, instead of three process invocations, there is just one.

One more tip: if you find yourself with a .tar.gz file (or whatever) and want to unpack it somewhere else, you don’t have to move the file at all. If you utilize this general sequence, the archive can be anywhere and the unpacked data can go anywhere. Assume that the working directory contains the archive, and the unpacking is to be done in another directory (such as /tmp):

gunzip -c myfile.tar.gz | ( cd /tmp ; tar xvf - )

Using the parenthesis allows you to change the working directory temporarily, and thus to utilize the tar command in a different directory. Inversely, if you were in this same situation but were located in the /tmp directory (and unpacking the archive located in your home directory) – you can do this:

( cd $HOME; gunzip -c myfile.tar.gz ) | tar xvf -

Why not: yet one more tip. Let’s say you want to go to this directory:

/etc/supercalifragilistic/expialidocious/atrocious!/

Rather than having to type that in (and try and get the spelling right!) use something like this to get there:

cd /etc/super*/expi*/atro*/

First, these will match the appropriate directories (assuming there is only one that matches all wildcards). However, with the final slash character in place, that means that only directories that begin with “atro” will be matched – files will not. Nifty, eh?

What’s more, once you’ve gotten to that directory with the nasty name – you can switch to another then back simply:

cd /etc/foo
cd -

That last command switches back to the previous directory – the very long-named directory mentioned before, all compressed down to a single character.

Tips on using the UNIX find command

When I used find, it took a while before I was able to use it regularly without looking it up.  For a smart introduction, this article from the Debian/Ubuntu Tips and Tricks site is good.  The GNU project has all of their manuals on the web, including the GNU find manual.

There is much more to the find command than just these introductory topics, however.  First, let us consider the tricks and traps of the find command:

  • The original find command required the -print option or nothing was printed at all.  Today, the GNU find does not require -print, and most other find commands seem to have followed suit.
  • Using the -exec option to find is less efficient than using the xargs command; in the Sun Manager’s mailing list there was a nice summary from Steve Nelson of this contrast.
  • Watch out for filenames with spaces and other things; the GNU find contains a -print0 option (and GNU xargs has a -0 option to match) just for this reason.  These options use an ASCII NUL to separate filenames.

Some tips for using find:

Multiple options can be placed in sequence with AND and OR boolean options (and parenthesis). For example, to find all files containing “house” in the name that are newer than two days and are larger than 10K, try this:

find . -name “*house*” -size +10240 -mtime -2

This is where some of the power of find can be seen.

Use all appropriate options.  The more you can narrow down the selection, the less you have to look.  For example, the -type and -xdev options can be quite useful.  The -type options selects a file based on its type, and the -xdev prevents the file “scan” from going to another disk volume (refusing to cross mount points, for example).  Thus, you can look for all regular directories on the current disk from a starting point like this:

find /var/tmp -xdev -type d -print

Get to know all of find’s options.

Use xargs instead of -exec.  Find will spawn a new process for each execution of -exec (though GNU find might be different).  xargs will load a single process (binary) into memory, parcels out the arguments (one to a line on stdin) into a set of command arguments, and runs the binary as necessary – repeating this process as often as necessary.

For example, an “exec” of rm would spawn a process for rm, load the rm binary for each file, run it once for each file, and release process memory.  Using xargs, the rm binary is loaded once, then as many arguments as possible are read from the standard input, rm is run with these arguments.  If there are more arguments, xargs repeats the process.

Don’t use find / .  Doing a find on a large number of files can slow the system down drastically.  Typically this is used by an administrator in order to find a file somewhere on the hard drive.  Better yet is to perform this command sequence overnight:

find / -print > /.masterfile

Then the /.masterfile can be searched using grep instead of tying the system up with lots of disk I/O during the day when users are counting on excellent system performance.

Remember to quote special characters.  In particular, any regular expressions and the left and right parenthesis should be quoted.  Typically, the regular expressions are put into double quotes, and left and right parens are quoted with a backslash.

Be wary of extensions to POSIX.1 find.  It’s not that they are bad, but rather that you cannot count on them being present.  Unfortunately, some of the most useful options fall into this category – but as long as you are aware of them, they can be used appropriately.  Some options in this category are:

  • -print0
  • -maxdepth
  • -mindepth
  • -iname
  • -ls

In particular, the -print0 is the most useful of the lot.

The BSD man page also brings up an interesting point about find and find options:

Historically, the -d, -L and -x options were implemented using the pri-
maries -depth, -follow, and -xdev. These primaries always evaluated to
true. As they were really global variables that took effect before the
traversal began, some legal expressions could have unexpected results.
An example is the expression -print -o -depth. As -print always evalu-
ates to true, the standard order of evaluation implies that -depth would
never be evaluated. This is not the case.

This has been a source of confusion in the past; considering them as global options (and placing them first) will provide some relief. Note that the -d, -L and -x options are likely BSD-specific.