Tips on using the UNIX find command

When I used find, it took a while before I was able to use it regularly without looking it up.  For a smart introduction, this article from the Debian/Ubuntu Tips and Tricks site is good.  The GNU project has all of their manuals on the web, including the GNU find manual.

There is much more to the find command than just these introductory topics, however.  First, let us consider the tricks and traps of the find command:

  • The original find command required the -print option or nothing was printed at all.  Today, the GNU find does not require -print, and most other find commands seem to have followed suit.
  • Using the -exec option to find is less efficient than using the xargs command; in the Sun Manager’s mailing list there was a nice summary from Steve Nelson of this contrast.
  • Watch out for filenames with spaces and other things; the GNU find contains a -print0 option (and GNU xargs has a -0 option to match) just for this reason.  These options use an ASCII NUL to separate filenames.

Some tips for using find:

Multiple options can be placed in sequence with AND and OR boolean options (and parenthesis). For example, to find all files containing “house” in the name that are newer than two days and are larger than 10K, try this:

find . -name “*house*” -size +10240 -mtime -2

This is where some of the power of find can be seen.

Use all appropriate options.  The more you can narrow down the selection, the less you have to look.  For example, the -type and -xdev options can be quite useful.  The -type options selects a file based on its type, and the -xdev prevents the file “scan” from going to another disk volume (refusing to cross mount points, for example).  Thus, you can look for all regular directories on the current disk from a starting point like this:

find /var/tmp -xdev -type d -print

Get to know all of find’s options.

Use xargs instead of -exec.  Find will spawn a new process for each execution of -exec (though GNU find might be different).  xargs will load a single process (binary) into memory, parcels out the arguments (one to a line on stdin) into a set of command arguments, and runs the binary as necessary – repeating this process as often as necessary.

For example, an “exec” of rm would spawn a process for rm, load the rm binary for each file, run it once for each file, and release process memory.  Using xargs, the rm binary is loaded once, then as many arguments as possible are read from the standard input, rm is run with these arguments.  If there are more arguments, xargs repeats the process.

Don’t use find / .  Doing a find on a large number of files can slow the system down drastically.  Typically this is used by an administrator in order to find a file somewhere on the hard drive.  Better yet is to perform this command sequence overnight:

find / -print > /.masterfile

Then the /.masterfile can be searched using grep instead of tying the system up with lots of disk I/O during the day when users are counting on excellent system performance.

Remember to quote special characters.  In particular, any regular expressions and the left and right parenthesis should be quoted.  Typically, the regular expressions are put into double quotes, and left and right parens are quoted with a backslash.

Be wary of extensions to POSIX.1 find.  It’s not that they are bad, but rather that you cannot count on them being present.  Unfortunately, some of the most useful options fall into this category – but as long as you are aware of them, they can be used appropriately.  Some options in this category are:

  • -print0
  • -maxdepth
  • -mindepth
  • -iname
  • -ls

In particular, the -print0 is the most useful of the lot.

The BSD man page also brings up an interesting point about find and find options:

Historically, the -d, -L and -x options were implemented using the pri-
maries -depth, -follow, and -xdev. These primaries always evaluated to
true. As they were really global variables that took effect before the
traversal began, some legal expressions could have unexpected results.
An example is the expression -print -o -depth. As -print always evalu-
ates to true, the standard order of evaluation implies that -depth would
never be evaluated. This is not the case.

This has been a source of confusion in the past; considering them as global options (and placing them first) will provide some relief. Note that the -d, -L and -x options are likely BSD-specific.

The root account (and toor)

Traditionally, the root account (account 0) is not used for daily tasks.  This is widely known; however, this is the reason that root’s home directory was usually / (the root directory) as there was no need for .profile, .login, .Mail, and so forth.  The root account is even created under MacOS X with a locked-down password (that is, there is no valid password for root, making it impossible to log in as root).

However, this is most certainly not the case today – and more and more administrators use the root account for many tasks. One common problem is the problem of someone wanting to change the root shell – and then breaking the startup process since some scripts would assume that the shell is the Bourne shell.  This was more of a problem under BSD since the standard BSD shell was the C shell, and the startup scripts usually assumed the Bourne shell (which is completely incompatible with the C shell).  The toor account (that is, root spelled backwards) was created for this purpose: a person can log in as toor and have the C shell (csh), but not affect the standard startup process.  A toor user would still have the userid zero (0) but would for all intents and purposes be the root user.

This would also lead to the possible creation of a specific home directory for the toor user.

In MacOS X, the root user is locked down and no login is possible as root.  To access root, the sudo utility must be used as the admin user (which should be the user that installed MacOS X).

The wheel group is also part of this process; using the wheel group can expand the capabilities of some users in order to further reduce the need to actually use the root account as a shell account.

Thus, you can see that there is really no reason to use the root account.  But is that going to stop us? Perhaps it should…