Digital Data: Lost Forever?

In the past, I have worked with and for statisticians, satisfying their needs both as an administrator and as a programmer. Two things I learned about statisticians: you can never have enough disk space, and you can never keep data long enough.

Thus, losing data due to the inability to read the medium or the format it is in is indeed a crisis. I’ve heard people talk about this before, but just recently physorg.com has an article by Jerome McDonough titled ‘Digital dark age’ may doom some data. While it is mostly those in library science sounding the alarm, it is not limited to libraries and archives: statistics may lose old data, researchers can lose old research, and lawyers may find critical digital legal documents unreadable.

Take a moment and read the article and decide how you’re going to resolve the problem for your data. While you’re at it, you might look into your data retention policies (if you have any) – but that’s a topic for different day.

What to do when the system libraries go away…

You’ve been hacking away at this system (let’s be positive and upbeat and say it’s a test system and not production). Through a slip of the fingers, you move the system libraries out of the way – all of them. Now nothing can find the libraries. Now what? Is everything lost?

Don’t despair! You can do a lot without libraries. Already loaded software has the libraries in memory, so that is okay. This includes the shell, so the shell should be okay.

There may be some statically compiled binaries on the system that don’t require libraries; these can be run. If a scripting language like perl or ruby is statically compiled, then all is well – these languages can do anything, and can replace binaries (temporarily) such as mv, cp, and others. However, since vi is probably not statically linked, you may have to do it at the command line (and not in an editor).

Here are some things one can do:

echo *

Through the use of the shell’s filename expansion, this works out to a reasonable imitation of ls (ls -m, in particular). If you have to empty a file (make the contents nothing), use this command:

> file

Every standard utility today is dynamically linked; this means that in situations like these you are stuck with only what the shell itself provides. Remember that things like cat, ls, mv, cp, vi, rm, ln, and so on are all system executables – and quite possibly dynamically linked.

The best thing for a situation like this is to have prepared in advance – have a copy of busybox handy, and possibly a statically compiled perl or ruby (or both). Don’t forget editors – either have a copy of e3 or of a statically compiled editor. Busybox provides all the standard utilities in one statically created binary, and e3 is an editor that is tiny (and i386-specific) which emulates vi, pico, wordstar, and emacs (based on its name).  Neither busybox nor e3 require additional libraries.

A good tool (and a good tool in case of security breach) is a small CDROM of tools, all statically linked for your environment. Such a disk requires no libraries at all – and could have all of these necessary tools and more.

Of course, the best thing is to avoid doing this kind of thing in the first place…

Listing shared libraries in running processes

The utility lsof is a very useful utility, and can be used to list the shared libraries being used by a running process. It can be important to know if a running process is using a particular library, perhaps for forensics reasons or for library upgrades.

To list all the libraries in a particular process, try this command:

lsof -a -c name +D /usr/lib

This will list all files used by name in /usr/lib. To list all files used by name, just use:

lsof -c name

Alternately, to find all processes using a file (library) in /usr/lib, use this command:

lsof /usr/lib/libname

The -c option specifies the beginning of a name of a process to list. The -a option is used to create a boolean AND set; otherwise, lsof assumes a boolean OR set of options. With the +D option (which scans for files recursively down the directory tree), the first example looks for the process name that also has open files from the /usr/lib directory tree.

Another good use of lsof has to do with finding files that are open but deleted. Such a situation could potentially happen with a shared library if the library was deleted while a file was using it. This could perhaps happen during a library upgrade. Use this command to do this:

lsof +L1

The +L option specifies files with a specific number of links; here, any file with less than one link (that is, zero links) will be listed. Files with zero links are not listed in the filesystem but are open and in use by a file. The blocks from such files remain marked as in use by the filesystem, but the file cannot be found by name anywhere and has no inode.

There is a nice concise article by Joe Barr at Linux.com about what you can do with lsof. Lsof is available for download.