Using Nagios from an Android Phone

I got an Android phone in the last year, and started looking in earnest for a Nagios client for it. With a Nagios client, you can read what the current status is of your systems in Nagios.

There are several available; the two most often mentioned are NagRoid and NagMonDroid. However, neither one of these worked for me, and there are indeed others that were good.

All of the clients use the same basic method to get data from Nagios: scrape the data from a web page. The biggest problem comes when that web page is not available – or is incorrect. Most of these applications request a URL, but sometimes are unclear as to what URL they want exactly. Add to that the fact that Nagios changed its URL structure slightly between versions and it gets even more complicated.

To discover what was happening, I used tcpdump to watch the accesses to the web server from the Nagios clients, as well as watching the Apache logs. By doing this, I was able to discern what URLs were being loaded.

Here are some of the URL paths being looked for by the various clients:

  • /cgi-bin/tac.cgi
  • /cgi-bin/status.cgi
  • /cgi-bin/nagios3/statuswml.cgi?style=uprobs
  • /cgi-bin/nagios3/status.cgi?style=detail
  • /cgi-bin/nagios3/status.cgi?&servicestatustypes=29&serviceprops=262144

Further complicating matters in my case was the fact that any unrecognized URL was massaged (via mod_rewrite) into serving the main Nagios page via SSL.

However, by using mod_rewrite it was possible to rewrite the old /cgi-bin paths to a newer /cgi-bin/nagios3 path, and things started working.

In the case of the statuswml.cgi file, Google Chrome wanted to download the resulting file instead of actually using it somehow.

The main choices for Nagios clients on Android are these:

I have gone with aNag – it has a nice interface, good use of notification, and worked without trouble once the URL was fixed up. Several of the others never did work right – or they gave no indication that they were working right. In the case of jNag, it also requires a modified Nagios server and the installation of mkLivestatus. aNag was the one that was easiest to work with and get working.

aNag does use a mostly text-based format to show data, but it has the ability to manipulate services as well as one-button access to the web interface directly.

Instant 10-20% boost in disk performance: the “noatime” option

Many people already know about this option, but it is worth mentioning again. However, a description of what atime is is in order.

The atime is one of the three times associated with a UNIX file: the three are: the ctime (or change time – that is, when the inode was last changed); the mtime (or modified time – that is, when the file is changed); and lastly, the atime (or access time).

It is the continual changes to the atime that cause so much grief. Compared to the mtime and the ctime, the atime changes with alarming frequency. Every single time a file is accessed, the atime is updated to match the current time – whether the file is opened, read, written, or accessed in any manner whatsoever.

There was a Linux kernel mailing list discussion thread that gave rise to some interesting quotes on this topic.

The discussion became quite heated when Ingo Molnar suggested that atime should be the kernel default. He had this to say:

Atime updates are by far the biggest IO performance deficiency that Linux has today. Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the past 10 years, _combined_.

and:

It’s also perhaps the most stupid Unix design idea of all times. Unix is really nice and well done, but think about this a bit: ‘For every file that is read from the disk, lets do a … write to the disk! And, for every file that is already cached and which we read from the cache … do a write to the disk!’

and later, this:

Measurements show that noatime helps 20-30% on regular desktop
workloads, easily 50% for kernel builds and much more than that (in
excess of 100%) for file-read-intense workloads.

and, this:

Give me a Linux desktop anywhere and i can
tell you whether it has atimes on or off, just by clicking around and
using apps (without looking at the mount options). That’s how i notice
it that i forgot to turn off atime on any newly installed system – the
system has weird desktop lags and unnecessary disk trashing.

Linus had this to say:

yeah, it’s really ugly. But otherwise i’ve got no real complaint about
ext3 – with the obligatory qualification that “noatime,nodiratime” in
/etc/fstab is a must. This speeds up things very visibly – especially
when lots of files are accessed. It’s kind of weird that every Linux
desktop and server is hurt by a noticeable IO performance slowdown due
to the constant atime updates, while there’s just two real users of it:
tmpwatch [which can be configured to use ctime so it’s not a big issue]
and some backup tools. (Ok, and mail-notify too i guess.) Out of tens of
thousands of applications. So for most file workloads we give Windows a
20%-30% performance edge, for almost nothing. (for RAM-starved kernel
builds the performance difference between atime and noatime+nodiratime
setups is more on the order of 40%)

Changing a file system to run without atime is simple; use this command on a mounted filesystem:

# mount -o remount,noatime /disk

Don’t forget to change the /etc/fstab to match this.

Making the Case for Partitioning

What is it about partitioning? The old school rule was that there were separate partitions for /, /usr, /home, /var, and /tmp. In fact, default server installations (including Solaris) still use this partitioning set up.

Has partitioning outlived its usefulness?

This question has come up before. There are negative and positive aspects to partitioning, and the case for partitioning might not be as strong as it once was.

Partitioning means that you may have a situation with no space in one partition and lots in another. This, in fact, is the most common argument against partitioning. However, using LVM or ZFS where disks can be grown dynamically makes this a moot point. With technologies such as ZFS and LVM, you can expand a disk and filesystem any time you need to.

However, this still means that the system will require maintenance – but that is what administrators are for, right? If a filesystem fills – or is going to fill – it is up to a system administrator to find the disk space and allocate it to the disk.

Another argument against partitioning says “Disk is cheap.” Well, if this is true, then why do companies still balk at getting terabytes of disk into their SANs? The phrase “Disk is cheap” is trite but untested: in truth, buying 144Gb disks is not done in bulk. Companies still have to watch the budget, and getting more disk space is not necessarily going to be a high priority until disk runs out.

So, what are the benefits to partitioning disks? There are many.

Each partition can be treated seperately – so the /usr filesystem can be mounted read-only, the /home directory can be mounted with the noexec and nosuid options, which makes for a more secure and more robust system.

Also, if there are disk errors, then a single partition can be affected rather than the entire system. Thus, on a reboot, the system still comes up instead of being blocked because the root filesystem is trashed. In this same vein, if a filesystem requires a check, going through a 144Gb filesystem check could take a very long time – whereas, if the partition was 10Gb it would be not nearly as long and the system would come back up that much faster.

Backups – and restores – are another thing that is simplified by having multiple partitions. For example, when backing up an HP-UX system using make_tape_recovery, you specify which partitions to back up to tape. These partitions are then restored when the tape is booted. If you used a single partition for everything (data, home, etc.) then you would probably not be able to make this sort of backup at all.

One of the nicest reasons to partition is the ability to separate user data from system data. This allows the reinstallation of the system while keeping user data (and application data) untouched. This saves time and effort. I recently installed Ubuntu Server in place of Red Hat Enterprise Linux, and since the system was a single partition, there was no way to install Ubuntu Server without wiping out 200Gb of application data and restoring it – which took around 9 hours each way on a gigabit network link (if nothing else was sharing the network). Alternately, I converted my OpenSUSE laptop to using Xubuntu – and was able to keep all of my user settings because /home was on a separate partition. Keeping a system on a single partition cost the company somewhere in the order of a full day’s worth of time and effort – how much money would the company have saved by having a separate partition for /var/lib/mysql?

Performance is another reason for partitioning – but this is only relevant for separate disks. If you have multiple disks, you can segregate them into separate partitions, which means then that if a disk is heavily used for one purpose it can be dedicated to that purpose – you won’t have your database accesses slowing down because of system log writes, for example. However, this problem can be reduced or eliminated – or made moot – by moving data to a striped volume, and possibly with disk cache as well. Yet, as long as there are disk accesses, do you want them competing with your database?

Having said all of this, how does using a virtual machine change these rules? Do you need partitioning in a virtual machine?

A virtual machine makes all of the arguments relating to the physical disk moot – performance on a virtual disk doesn’t have a high correlation with true physical hardware unless the virtual machine host is set up with segregated physical disk. Even so, the disks may actually be separate LUNs created in a RAID.

However, the ability to secure a filesystem (such as /usr), save filesystem check time, prevent excessive /home usage, and other reasons suggest that the case for partitions is still valid.

DOS Partitions (fdisk) and the 2TB Limit

If you are trying to create disk volumes over two terabytes (2TB) you’ll find that fdisk won’t let you do it. The problem lies not with fdisk, but with the old PCDOS disk label used on disks for the last three decades or so. Back in 1981 when the IBM PC was introduced, a disk of over two terabytes would have seemed inconcievable.

Thus, we struggle with the limitations of PCDOS disk labels today.

Some (newer?) versions of fdisk report the problem with large drives, giving this warning:

WARNING: The size of this disk is 8.0 TB (8001389854720 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).

To get around the size limitation, there is only one solution: dump the PCDOS disk label for another label. The usual recommendation is the GPT (the GUID Partition Table) created by Intel. The GPT has a much larger limit, making 2TB partitions feasable.

However, the Linux utility fdisk does not work with drives that use GPT; thus, you have to use a different partitioning tool. The usual recommendation to Linux users is GNU parted. GNU parted handles multiple partition formats, including GPT. Documentation is available in multiple formats (PDF is one).

The steps to getting a large partition done with parted are simple:

  1. Create a disklabel (partitioning) on disk.
  2. Create a partition of the appropriate size.
  3. Create a filesystem (if needed).
  4. Mount.

First, create the GPT disklabel – in other words, create the partitioning tables to support GPT:

# parted /dev/sdc
GNU Parted 2.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Ext Hard  Disk (scsi)
Disk /dev/sdc: 8001GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
 
Number  Start   End     Size    Type     File system  Flags
 
(parted) mklabel gpt
Warning: The existing disk label on /dev/sdc will be destroyed and all data on this disk will be lost. Do you want to
continue?
Yes/No? yes

Then after this, create a partition:

(parted) mkpart primary xfs 0 8001.4GB
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? c

This is what happens when the disk is not aligned properly on the appropriate boundary. Traditionally, the boundary was 512 bytes; now it is changing to 4K. GPT also apparently use a significant portion of the beginning of the disk.

To get around the alignment problem, you can use a start position of 1MB and an end position 1MB from the end:

(parted) mkpart primary xfs 1 -1
(parted) p
Model: Ext Hard  Disk (scsi)
Disk /dev/sdc: 8001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  8001GB  8001GB               primary

Parted supports a wide variety of units (and defaults to megabytes), all of which can be specified during numeric input – such as for start and end points. Using an end point of “100%” is probably just as good as using “-1” (1MB from end).

Jamie McClelland has a nice article about 2TB drives and alignment. Don’t blindly trust the tools to get the partitioning right; specify the appropriate numbers and units in order to force correct alignment.

GNU parted also supports an option that at least suggests it will use the best alignment when it can:

parted --align optimal

Again, don’t blindly trust it: check the numbers.

Bringing the Network up to (Gigabit) Speed

When looking at increasing network speed in the enterprise, there are a lot of things to consider – and missing any one of them can result in a slowdown in part or all of the network.

It is easy enough to migrate slowly by replacing pieces with others that support all of the relevant standards (such as 10/100/1000 switches). However, such a migration can bog down and leave old equipment in place and slowing down everyone.

First, determine if the infrastructure can handle the equipment. Is the “copper” of a sufficient grade to handle the increased demands? If not, then the cables will have to be replaced – perhaps with Cat-6 or better – or even fiber if your needs warrant it. Check for undue interference – fiber will not receive interference that copper would.

After the cabling is ready, check the infrastructure – all of it. It can be easy to miss one. Also check the capabilities of all. For example, can the switch handle full gigabit speeds on all ports at once? You might be surprised at the answer.

Once the equipment is in place – make sure that all clients are using gigabit speeds. Most switches should have indicators that tell if a port is running at a gigabit or not.

Make doubly sure that servers are running at full speed, as a slowdown there will affect everyone who uses that server. This becomes doubly important in the case of firewalls because of the impact.

Lastly, don’t forget telco equipment. If the connection to the T1 is still running at 100 megabits, then this will slow Internet access for the entire enterprise down.

One more thing – an upgrade such as this would be a perfect time to get more advanced equipment in house. Just be concious of the corporate budget. In such cases, it also helps to present improvements that the executives can see and experience personally rather than some elusive benefits that only the IT staff will see.

Good luck in your speed improvement project!

Statistical Analysis is Valuable for Understanding

In System Administration – and many other areas – statistics can assist us in understanding the real meaning hidden in data. There are many places that statistical data can be gathered and analyzed, including from sar data and custom designed scripts in Perl or Ruby or Java.

How about the number of processes, when they are started, when they finish, and how much processor time they take over the length of time they operate? Programs like HP’s Performance Agent (now included in most HP-UX operating environments) and SGI’s fabulous Performance CoPilot can help here. In fact, products like these (and PCP in particular) can gather incredibly valuable sorts of data. For example, how much time does each disk spend above a certain amount of writing, and when? How much time does each CPU spend above 80% utilization and when?

Using statistical data from a system could, with the proper programming, be fed back into a learning neural network or a bayesian network and provide a method of providing alarms for stastically unlikely events.

There are other areas where statistical analysis can provide useful data than just performance. How about measuring the difference between a standard image and a golden image based on packages used? How about analyzing the number of users that use a system, when they use it, and for how long? (Side note: I had a system once that had 20 or 30 users that each used the system heavily for one straight week or two in every six months… managing password aging was a nightmare…)

There are many places for analyzing a system and providing statistical data; this capability, however, has not been utilized appropriately. So what are you waiting for?

Intel’s New Upgradeable CPU: Not a New Idea – But is it a Good One?

There has been some discussion about the new processor from Intel which comes with some features disabled and unlockable only by purchasing an unlock code from Intel. Peter Bright has an excellent write-up on the idea of an upgradeable processor.

If you administer mainframes or enterprise servers, you’ve likely already seen this idea. HP Superdomes, for example, can be purchased with deactivated processors and so forth, then the processors can be turned on temporarily or purchased outright at a later date. IBM Z System also comes with a similar capability – often called something like Capacity on Demand.

The main question is whether the consumer will find this a desirable thing or not; it is possible that the idea will not sell. I find that system “upgrades” are actually done by replacing the system completely.

It is also probably a better idea to increase system memory than it is to upgrade to a faster, more capable processor. More memory means more can be done without going to disk, which is always important as disk is the slowest element.

A New Init for Fedora 14

Apparently, a new project (to replace init, inetd, and cron) named systemd is nearing release and will be used to replace upstart in Fedora 14 (to be released in November – with Alpha Release due today!).

There is a healthy crop of init replacements out there, and the field is still shaking out. Replacing init – or specifically, System V init and init scripts – seems to be one of those never-ending projects: everyone has an idea on how to do it, no one can agree on how.

Let’s recap the current crop (excluding BSD rc scripts and System V init):

I am still waiting for the shakeout – it bugs me that there are dozens of different ways to start a system, and that none of them have taken over as the leader. For years, BSD rc scripts and System V init have been the standard – and both have stood the test of time.

My personal bias is towards SMF (OReilly had a nice article on it) and towards simpleinit – but neither has expanded like upstart has.

So where’s the replacement? Which is The One? It appears that no one is willing to work within a promising project, but rather starts over and creates yet another replacement for init, fragmenting the market further.

Lastly, if the current init scheme is so bad, why hasn’t anything taken over? Commercial UNIX environments continue to use the System V scheme, with the sole exception of Solaris which made the break to System Management Facility (or SMF). Why doesn’t HP-UX or AIX use SMF or Upstart if the current environment is horrible?

Sigh. It’s not that the current choices of replacement are bad – it’s just that there are so many – and more keep coming up every day. Perhaps we can learn something about the causes of this fragmentation from a quote from a paper written about the NetBSD rc.d startup scripts and their design:

The change [in init] has been one of the most contentious in the history of the [NetBSD] project.

The Wikipedia Outage and Failover

The recent Wikipedia outage shows the problems with a typical failover system. The European data center that served Wikipedia’s servers there experienced a cooling failure, which caused the servers to shut down. This is a typical failure that can occur (though it should be prevented).

The event was logged in the admin logs starting at 17:18 on March 24. All of Wikimedia’s server administration is at wikitech.wikimedia.org.

What happened next extended the outage longer than it should have been: the failover from the European data center to Wikipedia’s Florida data center failed to complete properly.

Certainly, to prevent this failure, the failover (and fail-back) could have been tested further, the process refined, and the tests done routinely.

However, there is another possibility: use an active standby instead. That is, instead of having a failover process kick in when failure occurs, use an active environment where there are redundant servers serving clients.

If you have a failover process, it is a sort of “opt-in” – the servers choose to take over from the failed servers. Thus there is a process (the failover) that must be tested, and tested often to make sure that it will work in a normal situation. Testing also means in many cases that an actual service outage must be experienced. This is an active-passive high availability cluster model: the passive server must be brought online and take over from the failed nodes.

Using an active but redundant environment means that if any server – or data center – dies, then service is degraded slightly and nothing more. This describes an active-active high availability cluster model. There is no need for monthly testing – and perhaps, no testing at all: during upgrade times, the servers can be taken out of service one at a time and the results monitored.

The usual argument against such redundancy is cost: the redundant servers need to be able to take on a particular load, which is thus unavailable to other uses in normal operation. Yet, how much downtime can you experience before you start losing money or public good will?

If Wikipedia had put their servers together so that a failover was not necessary, it might have saved them from going down for several hours.

Is Oracle killing Sun Solaris support?

Recently, Oracle has made several changes in Solaris support that have people wondering if Solaris just got too expensive to run.

The first change was to move to a pay-for-security model which some have already compared to extortion. Patches for Solaris would only be available to paying support customers, leaving others to be insecure and without recourse.

The other change that Oracle has made is to force its paying support customers into an “all or nothing” support model: either all Solaris systems are under a support contract or Oracle will not enter into a support agreement. This means that in any environment that all Solaris systems must be accounted for and under Oracle support.

With this latter change, it may be that this pay-for-security model, while still unseemly, will have less of an effect than previously suggested. It may also convince many smaller businesses to scale back their Solaris installations and to get rid of older machines instead of holding on to them.

At its worst, it may mean that support for software on older Sun machines may wither faster, and that older machines will become obsolete – and useless – faster, increasing “churn” in the data center and (perhaps) making the data center more energy-efficient, while costing companies more and making Oracle more money.

However, one thing Oracle has not done is to clarify the future of OpenSolaris. The community is waiting for a definitive statement from Oracle; even former Sun employees working with OpenSolaris have no signs from Oracle in any direction.

UPDATE: Ben Rockwood over at the Cuddletech blog has excellent coverage, with detailed analysis of the relevant licenses and what it means for Solaris end-users. On the 26th he discusses the “all-or-nothing” support model, and on the 28th he writes about Oracle’s choice to remove the ability to use Solaris for free.