Investigating Mysterious Outbound TCP Connections

Recently, I had a situation where a firewall had outgoing TCP connections I knew nothing about. If you are to maintain a secure system and a secure network, this sort of thing demands investigation. (I won’t report full details in order to maintain anonymity for various entities.)

Where to start, then? First, use tcpdump and capture the traffic. It may be useful to capture it into a file for looking at with Wireshark. I watched the traffic flow across all interfaces by using tcpdump:

tcpdump -s0 -n -i any host 999.999.999.999

I noticed that there was no traffic coming from anywhere except the outgoing port on the firewall.

Then I became more interested in the IP address being connected to and the port (443 or HTTPS in this case). Connecting to the IP on port 443 didn’t turn up anything interesting (except they used Red Hat Enterprise Linux). Looking up the IP address in a whois listing showed that the IP address was very similar to that of the firewall maker – very interesting indeed. Looking up the IP in reverse DNS showed it to be an Amazon AWS host in Ireland.

Then I wrote a script that used lsof to watch for a connection and find the program making the connection:



# Prep: erase if present
rm -f $WORK

while true ; do
    if $(lsof -ni $PORT > $WORK) ; then
        ( echo "Found ports open:" ; echo ; cat $WORK ; echo ; echo "Process data:" ; echo
          lsof $(cat $WORK | sed -n '1d; s/^[^ ]*  *\([^ ]*\).*$/-p \1 /p;') ) | \
        mail -s "Found something on port $PORT"
        echo "Sent message at $(date)..."

# Clean up
rm -f $WORK

Because lsof returns 0 only if it has something to report, this works beautifully. I could have slowed it down with a sleep command, but this worked for my purposes.

It showed a program being run that was part of the firewall. Since it was running periodically, I went and looked for it in the crontab files:

grep program /etc/cron/

I found this program in a file in the /etc/anacron.hourly directory. If I had wanted to, I could have stopped the program from running at all by changing this file. I ran the commands independently of the crontab file to see what the output would be.

I was also able to get help from the program by using the option --help. The program was actually a python script located in /usr/bin, and I searched out the actual code that was called: it was compiled python source (a *.pyc file) found in /usr/lib/python2.4/site-packages/ – the compiled source can be decompiled and investigated.

If I wanted to take complete control, the program could have been renamed and a script put in its place which called the original script and did a little extra – such as report by mail every time the command runs, what the command line was, what the output was, and more.

There’s a lot that can be found out if you just know where to look.

Using Perl Modules from CPAN with Distribution Packaged Modules

When I first started using CPAN, this was a concern of mine that I was never able to fully address: how do Perl modules from CPAN interact with modules installed via distribution packages?

It turns out that the answer is simple. First, however, let’s install a set of tools called pmtools; these tools will tell you a lot and make it simple. Using these will help you to understand how your system is set up as well. In Ubuntu, the package is called pmtools; in Red Hat, it is perl-pmtools. To install, use this command on an Ubuntu system as root:

$ sudo apt-get install pmtools

On a Red Hat Enterprise Linux system, use this command instead:

$ sudo yum install perl-pmtools

Once that is installed, use the tool pmdirs to see what directories are searched for modules in order. Here is one example from an Ubuntu 10.04 LTS server system:

$ pmdirs

With this, you can see that several site-specific directories in /usr/local are searched before system directories in /usr.

Same thing holds true for Red Hat; here is an example from a Red Hat Enterprise Linux 5.7 system:

$ pmdirs

In Red Hat, system-installed Perl modules go into /usr/lib/perl5/vendorperl/$version/ while modules installed with CPAN go into /usr/lib/perl5/siteperl/$version/. On the other hand, in Ubuntu, CPAN modules go into /usr/local/share/perl5/$version/ while packaged modules are placed in /usr/lib/perl/$version/.

This means that modules installed with CPAN will be searched (and found) first – whether they are newer or not. Usually they will be newer, but if they are the same version you’re better off staying with the packaged version of the Perl module as it will often have bugfixes and patches not present on the original.

If you want to see where a particular module lives – use pmpath:

$ pmpath CPAN

This shows that CPAN was installed from (or updated from) CPAN as well.

If you want to see what version is installed, use pmvers:

$ pmvers CPAN

Note: In Ubuntu you may find that the directory /usr/local/lib/site_perl is missing and that some pmtools utilities will fail because of this. Use this command to create the directory (and stop the error messages):

# mkdir -p /usr/local/lib/site_perl

Why doesn’t my /bin/sh script run under Ubuntu?

This is a very interesting question – and the resolution is simple. In Ubuntu 6.10 (known as Edgy Eft) the decision was made to replace the Bourne Again Shell (bash) with the Debian Almquist Shell (or dash) as /bin/sh in Ubuntu. There was considerable uproar in Ubuntu brainstorm (community ideas) and in Ubuntu bug reports, as using dash instead of the original bash caused numerous scripts to break.

In particular, the entire reasoning given for this change was efficiency: dash is more efficient (i.e., faster) than bash. According to the explanatory document created by the Ubuntu developer team, Debian has required scripts to work on POSIX-compliant shells for some time (even pre-dating the Ubuntu project). Thus, any scripts that broke were, in essence, not “following directions” and deserved what they got.

To undo this change by the Ubuntu team, one can do this:

sudo dpkg-reconfigure dash

When this command executes, specify that you do not want dash to act as /bin/sh. This will make every script that runs /bin/sh run bash as has traditionally been the case.

You can also make your scripts run /bin/bash instead of /bin/sh; this provides all of the bash capabilities without any concern as to whether /bin/sh will change again.

Making the boot process faster is a laudable goal, but like the removal of OSS from the kernel, it caused a lot of problems for users.

In both cases, it appears that the Ubuntu team is more focused on doing the technologically “right” thing rather than providing a stable and reliable platform. Unfortunately, this means that you cannot rely on Ubuntu to stay reliable – at least from one version to the next. The response of Ubuntu to such system failures has always been that they are doing the “right” thing and the problem must be fixed by someone else (i.e., it’s not Ubuntu’s problem).

Users – many of them system administrators – take the brunt of this: they don’t care whose fault it is, nor do they care whether the boot process is faster or whether the Linux sound environment is “cleaner”; they care about the stability of their systems. A system that boots faster doesn’t matter if it crashes during the boot process because of a broken script.

If the focus of Ubuntu were to provide a stable and unchanging environment, then their decisions would be different – and would result in an improved customer experience.

Tips and Tricks for Using the Shell

There are many things that trip one up in using the shell – normally a Bourne shell, POSIX shell, or Korn shell. These tips can help you understand more of what happens during the usage of the shell, and will help you understand why things might go wrong.

One thing to realize is that the shell can be anything you want; it is a personal choice (unless it is the root shell). While commonly used shells include the Bourne Again Shell (bash), the Korn Shell, or the C shell, there are a lot more than just these. Consider these two alternatives for instance:

  • rc – a small shell used by Plan 9
  • scsh – a shell that incorporates a full Scheme48 interpreter

Now – assuming a Bourne-style shell – consider these two possible commands:

$ mybinary a b c
$ mybinary a* b* c* < f

The first command does not require the shell; any program that executes a command line (such as scripting languages) can execute a command line like that one without using the shell.

The second command requires a shell be started. Why? Because of the use of shell meta-characters like filename wildcards, redirection, and pipes. All of these require parsing by a shell before executing.

When using wildcards and other shell metacharacters, remember that the shell manipulates them first. The executable command in the first example gets three arguments: “a”; “b”; and “c”. The program running in the second command may see: “able”; “baker”; “charlie”; and who knows how many others – the command will not see “a*”, “b*”, or “c*” – unless the wildcard cannot expand to any files at all; in that case, the argument is passed directly into the command as is.

This can cause problems if you don’t watch out for it:

vi m*

If you are trying to edit Makefile and you’ve no files that start with m in that directory, then you start editing the file named m*.

This tidbit also comes in handy if you ever find that the command ls is bad or doesn’t work: echo works just as well as ls -m:

$ echo a*

This will cause the shell to expand the file wildcard, then echo prints the results.

This “pre-scanning” done by the shell also explains why a command like this fails when run in a directory that a user has no access to:

$ sudo echo foobar > restricted.file

The shell sets up redirection before sudo runs – so it is the shell that attempts to write to the file restricted.file – and as the original user, too.

To make this work, you have to find a way to defer the opening of the file (for writes) until after you have root access; a classic way is like this:

$ sudo ksh -c "echo foobar > restricted.file"

Thus, it is not the running shell that opens restricted.file but the executed ksh, which interprets the -c option as a command to run. The quotes prevent the active shell from interpreting the shell characters, leaving them for ksh.

This shell interpretation also explains why the first command may fail with a Too many arguments error, while the second will almost certainly work:

$ ls *
$ ls

In the first case, the shell expands the wild card to include all the files in the current directory; if there are too many files, this becomes too many arguments. In the second case, there are no arguments: it is up to the program itself to handle all the files (which ls does well).

Understanding how the shell scans its input is critical and allows you to understand how things should work. Consider a fragment like this one:

$ AB="*"
$ echo $AB
$ echo "$AB"
$ echo '$AB'

The output from this will be something like the following:

$ echo $AB
able baker charlie
$ echo "$AB"
$ echo '$AB'

Update: Fixed error in filename wildcard expansion – thanks to Brett for catching the error.

An Easier Way to Mount Disks in Linux (by UUID)

When mounting a disk, the traditional way has always been to use the name given to it by the operating system – such as hda1 (first partition on first hard drive) or sdc2 (second partition on third SCSI drive). However, with the possibility that disks may move from one location to another (such as from sdc to sda) what can be done to locate the appropriate disk to mount?

Most or all Linux systems now install with the /etc/fstab set up to use the Universally Unique Identifier (UUID). While the disk’s location could change, the UUID remains the same. How can we use the UUID?

First, you have to find out the UUID of the disk that you want to mount. There are a number of ways to do this; one traditional method has been to use the tool vol_id. This tool, however, was removed from udev back in May of 2009. There are other ways to find the UUID of a new disk.

One way is to use blkid. This command is part of util-linux-ng. The vol_id command was removed in favor of this one, and is the preferred tool to use instead of vol_id. Find the UUID of /dev/sda1 like this:

$ blkid /dev/sda1
/dev/sda1: UUID="e7b85511-58a1-45a0-9c72-72b554f01f9f" TYPE="ext3"

You could also use the tool udevadm (which takes the place of udevinfo) this way:

$ udevadm info -n /dev/sda1 -q property | grep UUID

Alternately, the tune2fs command (although specific to ext) can be used to get the UUID:

# tune2fs -l /dev/sda1 | grep UUID
Filesystem UUID:          e7b85511-58a1-45a0-9c72-72b554f01f9f

The tune2fs utility is also only available to the superuser, not to a normal user.

There are utilities similar to tune2fs for other filesystems – and most probably report the UUID. For instance, the XFS tool xfs_info reports the UUID (in the first line) like this:

$ xfs_info /dev/sda6
meta-data=/dev/disk/by-uuid/c68acf43-2c75-4a9d-a281-b70b5a0095e8 isize=256    agcount=4, agsize=15106816 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=60427264, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=29505, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

The easiest way is to use the blkid command. There is yet one more way to get the UUID – and it also allows you to write programs and scripts that utilize the UUID. The filesystem contains a list of disks by their UUID in /dev/disk/by-uuid:

$ ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 10 2011-04-01 13:08 6db9c678-7e17-4e9e-b10a-e75595c0cacb -> ../../sda5
lrwxrwxrwx 1 root root 10 2011-04-01 13:08 c68acf43-2c75-4a9d-a281-b70b5a0095e8 -> ../../sda6
lrwxrwxrwx 1 root root 10 2011-04-01 13:08 e7b85511-58a1-45a0-9c72-72b554f01f9f -> ../../sda1

Since a disk drive is just a file, these links can be used to “find” a disk device by UUID – and to identify the UUID as well. Just use /dev/disk/by-uuid/e7b85511-58a1-45a0-9c72-72b554f01f9f in your scripts instead of /dev/sda1 and you’ll be able to locate the disk no matter where it is (as long as /dev/disk/by-uuid exists and works).

For example, let’s say you want to do a disk image copy of what is now /dev/sda1 – but you want the script to be portable and to find the disk no matter where it winds up. Using dd and gzip, you can do something like this:

dd if=/dev/disk/by-uuid/${UUID} of=- | gzip -c 9 > backup-$UUID.gz

You can script the retrieval of the UUID – perhaps from a user – this way:

eval `blkid -o udev ${DEV}`
echo "UUID of ${DEV} is ${ID_FS_UUID}"

In the /etc/fstab file, replace the mount device (such as /dev/sda1) with the phrase UUID=e7b85511-58a1-45a0-9c72-72b554f01f9f (or whatever the UUID actually is).

Be aware that the UUID is associated with the filesystem, not the device itself – so if you reformat the device with a new filesystem (of whatever type) the UUID will change.

Programming Languages on OpenVMS

Looking at OpenVMS administration, and investigating the possible languages to use on OpenVMS, there are many options – some supported by HP and some community open source options.

One of the first options is Perl; apparently Perl was originally designed as a way to write scripts that worked under UNIX and VMS both. Bernd Ulmann wrote an article for OpenVMS Technical Journal 13 about Perl on OpenVMS and gave a presentation on it in the spring of 2009 at an HP Connect OpenVMS Meeting (English translation) in Germany.

The HP version of Perl appears to be tied to the Secure Web Server (SWS) but it can stand alone.

Another language that is growing on OpenVMS is Java. Jean-Yves Bourles and Thierry Uso wrote on Java and OpenVMS in OpenVMS Technical Journal 10. Netbeans is available from HP to facilitate Java development on OpenVMS.

With Java available, that opens up the possibility of perhaps using a language that runs on the Java JVM as well. That brings to mind JRuby, Jython, Groovy, Scala, and Clojure. Information on most of these is rather scarce unfortunately; only Scala and JRuby have ports (both by the aforementioned Thierry Uso). Personally, these two are the most interesting to me; Scala has unmatched integration with Java itself as well.

Python is also available. Python seems to be the new administration tool of choice; at least, Red Hat seems to think that way.

As part of the Secure Web Server (SWS), you also get HP’s version of PHP. However, this does not seem to be a separate product as Perl is, and there is no description of using PHP as a scripting language (which you can do by running PHP against a file from the command line).

Lua is graciously made available for OpenVMS by our friends over at Hoffman Labs. Lua is a fantastic scripting language that doesn’t get the cover that it deserves.

Lastly, Tcl/Tk is available as well.

So which do I recommend? Well, Perl, PHP, and Java are all HP supported products, so one could start there. With Java, I see Scala and JRuby being fantastic languages as well, although they are not supported by HP. Lua is also a favorite of mine; an OpenVMS version is wonderful; however, Lua is not as widely available for other platforms as is Perl and Java.

I should mention that PL/I is still active on OpenVMS; it is commercially sold and supported by Kednos. PL/I was an interesting language, but it doesn’t have modern capabilities.

At the German site there is also a big list of OpenVMS ports (including languages).

Are you ready for programming on OpenVMS? I am!

Using make and rsync for Data Replication

When maintaining a cluster environment (such as HP Serviceguard) there are often directories and configurations which need to be maintained on two different local disks (on different machines). Using make and rsync (with ssh) are excellent for this.

The rsync command allows you to replicate the local data onto the remote side copying only that which is necessary. This is not necessarily the fastest, but it is the most efficient: rsync was designed for efficiency over slow links, not speed over high speed links. Configure rsync to use ssh encryption automatically in the Makefile, then use rsync as the way to copy the files over:

RSYNC_RSH=/usr/bin/ssh -i /path/to/mykey

rsync -av $(LOCAL_FILES) remoteserver:$(PWD)

To automate this properly, an ssh key will have to be created using keygen and transfered to the other host. The private key (/path/to/mykey in this example) is used by ssh in the background during rsync processing; with the key in place, no interactive login is necessary.

For best purposes, create an “all” tag (at the top of the file) that explains the usable tags, and create a “copy” tag that does the relevant rsync.

I recommend copying only relevant files, not the entire directory: this way, some files can be retained only on one node – this is good for log files and for temporary files.

For example:

LOCAL_FILES=*.pkg *.ctl *.m4
RSYNC_RSH=/usr/bin/ssh -i /path/to/mykey

    echo "To copy files, use the copy tag..."

    rsync -av $(LOCAL_FILES) remserver:$(PWD)

Make sure to verify the code before you use it in normal operation. Use the rsync option -n to perform a dry run which affects nothing. Also make sure that you don’t update files on different hosts; things might get interesting (and unfortunate…)

After performing the update, the Makefile can trigger a reconfiguration or a reload of the daemons to put the configuration in place.

m4: 5 Places You Should Use It

The utility m4 is underutilized and underappreciated, and when paired with make can be indispensible. The mailer sendmail has made m4 legendary, and behind GNU autotools is a lot of serious m4 code. Why not use it for other purposes as well?

Here are some areas in which m4 can be used to ease your work:

  • cfengine. You can template the configuration of a “standard file” or a “standard directory” as m4 macros and save yourself a lot of typing (and errors). You could also define a standard “shell script” configuration and use m4 to create it every time.
  • Nagios. Nagios configurations benefit a lot from heavy use of m4. Macros can be used to template large configurations, and to template standard configurations.
  • DHCP server. A DHCP server can be set up to assign specific addresses to specific hosts; the configuration, however, can be tedious if there are more than a couple of hosts. Macros can be used to simplify static host configurations.
  • rpmrc. The /etc/rpmrc file is used to configure RPM during package build time. While not used as heavily, m4 can be used to create an rpmrc file specially tailored for the type of CPU running – this configuration will then be used when RPMs are built.
  • HTML generation. When creating hand-crafted HTML, m4 macros can simplify the creation of the standard tags, especially taking care of the beginning and ending tags transparently with only the content visible.

m4 makes templating easy. Anywhere that you need to set up a configuration with multiple sizable elements, m4 can help. A single m4 macro can expand to numerous configuration lines, and a couple dozen m4 lines can result in an extensive configuration set.

Using m4 macros in this manner prevents errors (because of less typing) and all appropriate configuration elements are the same (no mistaken copies).

m4 also adds “include” file capabilities to any configuration file where it is used. This permits common configurations to be reused everywhere, even though the configuration file may not support include files directly.

Even though make is not present on most system installs, m4 is. Adding make will complete this pair and set you on your way to automatic configuration. Try it today!

Scripting on the Java Virtual Machine

Now that OpenVMS has Java, and that HP-UX has Java, I started wondering about the possibility of running a scripting language on the Java Virtual Machine (JVM) as a way of supporting all these diverse environments with the same code and tools.

Choosing a language can be difficult, especially when there are so many good ones to choose from. I’ll assume for purposes of discussion that one is already familiar with at least one or more computer languages already (you should be!).

So what are the criteria a system administrator should use to choose a language on the JVM?

  • Does the language have a strong and vibrant community around it? The language might be nice, but if no new development is being done on it, it will eventually fail and stop working on the newer JVM. Bugs will not be fixed if development has halted. It also helps to have a large variety of people to call on when trouble arises, or when your code has to be maintained in the future.
  • Does the language support your favored method of programming? If you have no desire to learn functional programming, then don’t choose a language that is a functional language. Find a language that thinks the way you do (unless you are specifically trying to stretch your mind…).
  • Is your preferred langauge already available for the JVM? There are implementations of Ruby (JRuby), Python (Jython), LISP (Armed Bear Common Lisp), Tcl (Jacl) and many others. A language that you already know will reduce your learning time to near zero on the Java Virtual Machine.
  • What are the requirements? For example, JRuby requires a dozen libraries; Clojure and Armed Bear Common Lisp have no outside requirements. Which is simpler to install onto a new machine?

So what languages am I looking at? I am looking at these:

  • Clojure – a LISP-like functional programming language which seems to be taking off handsomely.
  • JRuby – Ruby is my all-time favorite scripting language, and having it available whereever the Java VM is is a very tintillating prospect. It’s also directly supported by Sun, the makers of Java.
  • Groovy – this is a new language that takes after Ruby and Smalltalk, and it is growing in popularity at a dramatic pace.
  • Scala – this is a language with a strong developer base and an object-oriented and functional design. Don’t know much more about it yet.
  • Armed Bear Common Lisp – ABCL is a full Common LISP implementation for the Java VM, and is used as part of the J editor. Unlike ABCL, development on J seems to have stopped; development on ABCL has gone through a resurgence after not quite dying for several years. ABCL is the closest thing to LISP on the JVM, and is usually the first mentioned – even though its development community is not nearly as strong as Scala or JRuby.

These are only the ones I’ve chosen to focus on; there are many, many more.

Getting Passwords from Random Data (portably!)

Over at Mark Kolich’s blog, he wrote several months ago about using a source of randomness (/dev/urandom) to generate passwords. The idea is simple enough: take the random data, strip out only the printable characters, and then print the desired length of characters for a password.

Shortly thereafter, he described how to use a simple shell script to generate many passwords – such as for setting up many different accounts.

Working with HP-UX and OpenVMS as I do, I immediately thought: how could I do this in Perl, making the idea portable and making a program that will work on both UNIX and OpenVMS? It was easy – and easy to make it flexible as well. Here is the program that I came up with:


# code released by David Douthitt into the public domain

use Getopt::Long;

GetOptions( 'l=i' => \$opt_l,
            'p=s' => \$opt_p,
            'm=i' => \$opt_m );

$pat{"ext"} = "[[:alnum:][:punct:]]";
$pat{"alnum"} = "[[:alnum:]]";
$pat{"alpha"} = "[[:alpha:]]";
$pat{"simple"} = "[a-km-z2-9]";
$pat{"normal"} = "[a-km-z2-9A-HJ-NPR-Z]";

if (defined($opt_p)) {
   if (defined($pat{$opt_p})) {
      $pat = $pat{$opt_p};
   } else {
      print "undefined pattern!\n";
} else {
   $pat = $pat{"normal"};

$max = (defined($opt_m) ? $opt_m : 1000);
$len = (defined($opt_l) ? $opt_l : 6);

$x = $len;

for $i (0..$max) {
   $c = chr(int(rand(255)));
   if ($c =~ /$pat/o) {
      $s .= $c;
      if (--$x == 0) {
         print "$s\n";
         $x = $len;
         $s = "";

Note that since OpenVMS does not use the “#!” notation, that this line will be ignored as a comment and the program needs to be invoked via direct invocation of perl itself.

As an aside, Mark says how he prefers random passwords. Me, I prefer “pronouncable” passwords – still random, but using phoenemes which makes the generation process just that more complicated – and complicates internationalization. Apple’s MacOS X comes with a password generator that can generate random and pronouncable passwords.

However, with the proper password storage system a fully randomized password is good – or is it? A completely random password of eight characters could be zzzzzzzz as much as anything else. Perhaps a password with a random distribution of characters (rather than a random selection of characters) would be better. I’m not aware of any password generators that guarantee a random distribution instead of a random collection.

Powered by ScribeFire.