Tips on using the UNIX find command

When I used find, it took a while before I was able to use it regularly without looking it up.  For a smart introduction, this article from the Debian/Ubuntu Tips and Tricks site is good.  The GNU project has all of their manuals on the web, including the GNU find manual.

There is much more to the find command than just these introductory topics, however.  First, let us consider the tricks and traps of the find command:

  • The original find command required the -print option or nothing was printed at all.  Today, the GNU find does not require -print, and most other find commands seem to have followed suit.
  • Using the -exec option to find is less efficient than using the xargs command; in the Sun Manager’s mailing list there was a nice summary from Steve Nelson of this contrast.
  • Watch out for filenames with spaces and other things; the GNU find contains a -print0 option (and GNU xargs has a -0 option to match) just for this reason.  These options use an ASCII NUL to separate filenames.

Some tips for using find:

Multiple options can be placed in sequence with AND and OR boolean options (and parenthesis). For example, to find all files containing “house” in the name that are newer than two days and are larger than 10K, try this:

find . -name “*house*” -size +10240 -mtime -2

This is where some of the power of find can be seen.

Use all appropriate options.  The more you can narrow down the selection, the less you have to look.  For example, the -type and -xdev options can be quite useful.  The -type options selects a file based on its type, and the -xdev prevents the file “scan” from going to another disk volume (refusing to cross mount points, for example).  Thus, you can look for all regular directories on the current disk from a starting point like this:

find /var/tmp -xdev -type d -print

Get to know all of find’s options.

Use xargs instead of -exec.  Find will spawn a new process for each execution of -exec (though GNU find might be different).  xargs will load a single process (binary) into memory, parcels out the arguments (one to a line on stdin) into a set of command arguments, and runs the binary as necessary – repeating this process as often as necessary.

For example, an “exec” of rm would spawn a process for rm, load the rm binary for each file, run it once for each file, and release process memory.  Using xargs, the rm binary is loaded once, then as many arguments as possible are read from the standard input, rm is run with these arguments.  If there are more arguments, xargs repeats the process.

Don’t use find / .  Doing a find on a large number of files can slow the system down drastically.  Typically this is used by an administrator in order to find a file somewhere on the hard drive.  Better yet is to perform this command sequence overnight:

find / -print > /.masterfile

Then the /.masterfile can be searched using grep instead of tying the system up with lots of disk I/O during the day when users are counting on excellent system performance.

Remember to quote special characters.  In particular, any regular expressions and the left and right parenthesis should be quoted.  Typically, the regular expressions are put into double quotes, and left and right parens are quoted with a backslash.

Be wary of extensions to POSIX.1 find.  It’s not that they are bad, but rather that you cannot count on them being present.  Unfortunately, some of the most useful options fall into this category – but as long as you are aware of them, they can be used appropriately.  Some options in this category are:

  • -print0
  • -maxdepth
  • -mindepth
  • -iname
  • -ls

In particular, the -print0 is the most useful of the lot.

The BSD man page also brings up an interesting point about find and find options:

Historically, the -d, -L and -x options were implemented using the pri-
maries -depth, -follow, and -xdev. These primaries always evaluated to
true. As they were really global variables that took effect before the
traversal began, some legal expressions could have unexpected results.
An example is the expression -print -o -depth. As -print always evalu-
ates to true, the standard order of evaluation implies that -depth would
never be evaluated. This is not the case.

This has been a source of confusion in the past; considering them as global options (and placing them first) will provide some relief. Note that the -d, -L and -x options are likely BSD-specific.

6 thoughts on “Tips on using the UNIX find command”

  1. You may be interested to know that — in OS X Leopard — the find command has the new enhanced version of -exec which emulates xargs behavior. But new syntax is required to invoke it:

    -exec “command” {} +

    Just try some of these commands (tailored for Mac files), and compare their execution times as well as their output formats (which provide evidence whether new instances of “command” were spawned):

    time find /Library/Scripts -type f -exec file -n {} \;
    time find /Library/Scripts -type f -print0|xargs -0 file -n
    time find /Library/Scripts -type f -exec file -n {} +

    time find /Library/Scripts -type f -exec ls -lSr {} \;
    time find /Library/Scripts -type f -print0|xargs -0 ls -lSr
    time find /Library/Scripts -type f -exec ls -lSr {} +

    time find /Library/Scripts -type f -exec du -cks {} \;
    time find /Library/Scripts -type f -print0|xargs -0 du -cks
    time find /Library/Scripts -type f -exec du -cks {} +

    I think you’ll “find” the new -exec may even surpass xargs!

    -HI-

Leave a reply to scripter Cancel reply