GNU find tips (again!)

I am reminded of one fact (by this post) about find that you may not be aware of (or have forgotten). The find command, by default, does not follow symbolic links: they are just another file for find to scan.

In order for a symbolic link to be recognized and used in its usual manner, the -follow option is needed.

This brings up another point about find that will clear up some confusion, even for experts. The find command mixes standard options (which change its overall behavior) with conditionals (picking and choosing files). Thus, some options can occur anywhere and change the operation of find as a whole; others are used in an implicit command line conditional. The man page calls these options “position-independent terms”.

So, even though a find option appears after the -a option (AND operator) it could actually be a true option to find, changing the command’s behavior (and thus not be relevant to the -a “option”).

The previously mentioned -follow option is a true option. However, options like -name, -a, -o, -size are all conditionals. If you keep this in mind, the occasional option in the middle of a conditional will not throw you.

The HP-UX man page for find lists the following position-independent terms:

  • -depth
  • -follow
  • -fsonly
  • -xdev

Let’s consider some more ideas on how we can use find to help us daily.

If this hasn’t been mentioned already, the -nouser and -nogroup options can be quite useful in a cron job for finding files that don’t have a proper user or group associated with them (these files are a security risk):

find / -nouser -o -nogroup -print

As we discussed earlier, using find in automated mode with an rm command is not a good idea – but at the command line, there is a replacement for -exec that will provide you with a safety net: -ok:

find . -name "f*" -ok rm -f {} \;

This command is functionally equivalent to:

find . -name "f*" | xargs rm -i

Both commands will prompt you as to whether you want to erase the file found or not.

The option -xdev is a favorite of mine: don’t follow any mount points. So if you want to search /tmp for various files but have things mounted in /tmp/mnt or other locations, this command will allow you to skip files under such mountpoints:

find /tmp -xdev -name "foo*"

This will find /tmp/foobar – but not /tmp/mnt/fooey.

An enhancement to the -exec command can improve performance – and reduce the need for xargs. Instead of terminating the command string with a semicolon (;), terminate it with a plus sign (+) right after the {} which signifies the location of the filenames in the command line:

find /tmp -name "f*" -exec ls -ld {}\+

The plus (+) is escaped, just like the semicolon would be, since the shell recognizes both characters as special characters; the backslash makes the shell treat them like any other character.

The UNIX find command and efficiency

The find command is a very demanding command, and can slow a system way down – and can take a long time as well. There are some ways to avoid these drawbacks.

First, when the urge strikes you to use find / -name "somename" – just don’t do it. This will take a very long time and will notably slow the system down. If you are using the GNU findutils, you may have the program locate on your system, which is much faster. Otherwise, you can use print the output from a find / command (run during off hours) and search the file with grep whenever you have a need to.

Secondly, you may wish to avoid using the -exec parameter. This parameter will run a command on each file that is found. Each time a file is found and printed, the find command loads this specified command, gives it the filename, and runs the command – which is very inefficient. GNU find has the ability to stack filenames together, but that still is not enough.

The most efficient way is to use xargs instead:

find . -mtime -1 | xargs ls -ld

This will combine all of the files together (as much as possible) and will run the binary image of the command as many times as necessary to handle it – without reloading the command at all. Of course, if you have GNU findutils, the builtin -ls option may be even faster!

find . -mtime -1 -ls

You can also manipulate where things go in the command output, or replicate items more than once, and so on. The operation of xargs is simple once you understand it, but the power is tremendous, and it is on all UNIX/Linux platforms. Check out the xargs(1) man page from OpenBSD.

Tips on using the UNIX find command

When I used find, it took a while before I was able to use it regularly without looking it up.  For a smart introduction, this article from the Debian/Ubuntu Tips and Tricks site is good.  The GNU project has all of their manuals on the web, including the GNU find manual.

There is much more to the find command than just these introductory topics, however.  First, let us consider the tricks and traps of the find command:

  • The original find command required the -print option or nothing was printed at all.  Today, the GNU find does not require -print, and most other find commands seem to have followed suit.
  • Using the -exec option to find is less efficient than using the xargs command; in the Sun Manager’s mailing list there was a nice summary from Steve Nelson of this contrast.
  • Watch out for filenames with spaces and other things; the GNU find contains a -print0 option (and GNU xargs has a -0 option to match) just for this reason.  These options use an ASCII NUL to separate filenames.

Some tips for using find:

Multiple options can be placed in sequence with AND and OR boolean options (and parenthesis). For example, to find all files containing “house” in the name that are newer than two days and are larger than 10K, try this:

find . -name “*house*” -size +10240 -mtime -2

This is where some of the power of find can be seen.

Use all appropriate options.  The more you can narrow down the selection, the less you have to look.  For example, the -type and -xdev options can be quite useful.  The -type options selects a file based on its type, and the -xdev prevents the file “scan” from going to another disk volume (refusing to cross mount points, for example).  Thus, you can look for all regular directories on the current disk from a starting point like this:

find /var/tmp -xdev -type d -print

Get to know all of find’s options.

Use xargs instead of -exec.  Find will spawn a new process for each execution of -exec (though GNU find might be different).  xargs will load a single process (binary) into memory, parcels out the arguments (one to a line on stdin) into a set of command arguments, and runs the binary as necessary – repeating this process as often as necessary.

For example, an “exec” of rm would spawn a process for rm, load the rm binary for each file, run it once for each file, and release process memory.  Using xargs, the rm binary is loaded once, then as many arguments as possible are read from the standard input, rm is run with these arguments.  If there are more arguments, xargs repeats the process.

Don’t use find / .  Doing a find on a large number of files can slow the system down drastically.  Typically this is used by an administrator in order to find a file somewhere on the hard drive.  Better yet is to perform this command sequence overnight:

find / -print > /.masterfile

Then the /.masterfile can be searched using grep instead of tying the system up with lots of disk I/O during the day when users are counting on excellent system performance.

Remember to quote special characters.  In particular, any regular expressions and the left and right parenthesis should be quoted.  Typically, the regular expressions are put into double quotes, and left and right parens are quoted with a backslash.

Be wary of extensions to POSIX.1 find.  It’s not that they are bad, but rather that you cannot count on them being present.  Unfortunately, some of the most useful options fall into this category – but as long as you are aware of them, they can be used appropriately.  Some options in this category are:

  • -print0
  • -maxdepth
  • -mindepth
  • -iname
  • -ls

In particular, the -print0 is the most useful of the lot.

The BSD man page also brings up an interesting point about find and find options:

Historically, the -d, -L and -x options were implemented using the pri-
maries -depth, -follow, and -xdev. These primaries always evaluated to
true. As they were really global variables that took effect before the
traversal began, some legal expressions could have unexpected results.
An example is the expression -print -o -depth. As -print always evalu-
ates to true, the standard order of evaluation implies that -depth would
never be evaluated. This is not the case.

This has been a source of confusion in the past; considering them as global options (and placing them first) will provide some relief. Note that the -d, -L and -x options are likely BSD-specific.