Puppet error: already in progress; skipping

Sometimes, you may try to run your puppet agent, and get an error like this:

# puppet agent --test
notice: Run of Puppet configuration client already in progress; skipping

If there is indeed another puppet agent running, it is simple enough to stop it and try again. However, what if this message appears, and there aren’t any other puppet instances running?

This happens because there is a flag stored in a file that didn’t get erased. Do this – but only if puppet is not in fact running:

# cd /var/lib/puppet/state
# rm -f puppetdlock

This will delete the lock, and puppet should start cleanly the next time. This tip works with puppet version 2.6.3.

Puppet refuses to run: “run already in progress”

Recently, one of the servers appeared to not be keeping up with configuration changes. Since it runs Puppet, this is a problem – it means that the changes at the puppet server are not getting propagated to the clients. The server is running Ubuntu Lucid Lynx Server 10.04.3 and Puppet 2.6.3.

So I shut down the puppet agent and tried running it manually:

# service puppet stop
 * Stopping puppet agent
   ...done.
# puppet agent --test
notice: Run of Puppet configuration client already in progress; skipping
#

Since puppet is definitively not running, I had to do some research and find out why it was not running.

I found this bug (Puppet bug #2888) that stated sometimes puppet does not remove its lockfile /var/lib/puppet/state/puppetdlock. Sure enough, on my system, the lockfile was still there. I deleted it and puppet ran normally.

There was also a bug report (Puppet bug #5246) that suggested puppet sometimes does not remove its pidfile /var/lib/puppet/run/agent.pid. Some of the testing suggests that this bug is confined to running puppet --onetime (without other options). I don’t think this affected me: after removing the lockfile, puppet ran normally.

System Management Software (Spacewalk and Landscape)

System management software is a nebulous term; the discussion here is about software to provision new servers, manage packages, control updates, and monitor servers, all from a central location. This does not necessarily include server hardware inventory, software build management, and other related tasks.

The Red Hat Network is a perfect example; Spacewalk is the open-source version of the Red Hat Network Satellite. Spacewalk has been out for a while, and recently released version 0.7. Originally, Spacewalk required Oracle as the back-end database; they may have been able to remove this dependency (replacing Oracle with PostgreSQL). The CentOS Wiki has a very nice HowTo describing how to install and run Spacewalk.

However, before implementing Spacewalk 0.7, note that Lee Verbern notes that the 0.7 client is broken (rhnsd does not work properly). The problems should be fixed in the next release.

Canonical’s Landscape is a counterpart to the Red Hat Network and is available for Ubuntu systems. Like the Red Hat Network, Canonical’s Landscape is a commercial product and closed source. Canonical has a blog for Landscape news, but the blog hasn’t been updated since November 2009. The Landscape project has a nice page with links to descriptions, tours, frequently asked questions, and more.

The blog WorkswithU has a nice article describing Landscape (albeit from February 2009).

Amazingly, the Canonical Landscape team even has a YouTube account with many valuable videos describing Landscape as well as many tutorials. They have a video introduction to Landscape you might want to see.

Finding an open source provisioning tool (outside of Spacewalk) is difficult; these tools are not common nor are they used by the average user.

One apparently powerful tool seems to be ControlTier, although it leans more towards package (and service) management than provisioning. ControlTier seems to be extremely flexible, allowing you to write scripts to interface with a variety of products and systems. ControlTier also has a blog, though it hasn’t been updated since November 2009.

The ControlTier team worked with Reductive Labs (the folks behind the open source configuration management tool Puppet) to create an interesting whitepaper about integrating ControlTier with Puppet.

I think I’d like to try ControlTier with Puppet; in particular, learning Puppet would be a good thing. I’ll report my experiences.

RAID is not a backup!

This post describes the authors experience, almost losing his data on a RAID disk set. He also gives good details on why RAID is not a backup and how he rectified the situation. Remember: RAID is not a backup!

When working with corporate systems, a complete, reliable, and tested backup system is important. RAID does not protect you against many (or even most) disasters that could happen.

RAID is designed to protect against one thing: disk failure. It does not protect against user error, operator error, site destruction, and many more possibilities.

So how do I back things up? I must admit, I’ve improved my backup strategies of late. I currently have several tools that I use and would recommend to you:

  • SpiderOak. This is an online backup service which offers the first 2Gb backup free. They also maintain multiple version backup, so if you want a file from two versions back, it’ll still be there. This service is worth paying for, I’d say.
  • For my Mac, I’ve used PsyncX periodically (albeit not automated). It has come in handy more than once as my laptop died several times – I’ve one of those iBooks that was notorious for video hardware that failed annually (and Apple would fix for free, but never admitted fault). If you’ve a Mac, get an external drive and use PsyncX to save your home directory off. Also recommended: put your applications in your home directory, not the system directory: restoring your home directory will then be enough to get your applications back.
  • For UNIX, the similar alternative to PsyncX is rsync: again, get an external drive and save your home directory off to it regularly.
  • Also, come at it from the other direction: save your configuration by putting it into a cfengine or puppet setup and saving that as well. If the machine fails, running cfengine or puppet on startup will restore the system to its original state.
  • One other item – that may seem a bit unusual – is using Thinkfree Office. Thinkfree Office gives you a way to save documents locally and have them mirrored in the Internet cloud – and you can also manipulate your documents on the web as well. Of course, this is only entirely true for documents that Thinkfree Office can edit.

It would seem that cfengine v3 is now available for download – that will have to be a subject for a new article.

Automation: Live and Breathe It!

Automation should be second nature to a system administrator. I have a maxim that I try to live by: “If I can tell someone how to do it, I can tell a computer how to do it.” I put this into practice by automating everything I can.

Why is this so important? If you craft every machine by hand, then you wind up with a number of problems (or possible problems):

  • Each machine is independently configured, and each machine is different. No two machines will be alike – which means instead of one machine replicated one hundred times, you’ll have one hundred different machines.
  • Problems that exist on a machine may or may not exist on another – and may or may not get fixed when found. If machine alpha has a problem, how do you know that machine beta or machine charlie don’t have the same problem? How do you know the problem is fixed on all machines? You don’t.
  • How do you know all required software is present? You don’t. It might be present on machine alpha, but not machine delta.
  • How do you know all software is up to date and at the same revision? You don’t. If machine alpha and machine delta both have a particular software, maybe it is the same one and maybe not.
  • How do you know if you’ve configured two machines in the same way? Maybe you missed a particular configuration requirement – which will only show up later as a problem or service outage.
  • If you have to recover any given machine, how do you know it will be recovered to the same configuration? Often, the configuration may or may not be backed up – so then it has to be recreated. Are the same packages installed? The same set of software? The same patches?

To avoid these problems and more, automation should be a part of every system wherever possible. Automate the configuration – setup – reconfiguration – backups – and so forth. Don’t miss anything – and if you did, add the automation as soon as you know about it.

Things like Perl, TCL, Lua, and Ruby are all good for this.

Other tools that help tremendously in this area are automatic installation tools: Red Hat Kickstart (as well as Spacewalk), Solaris Jumpstart, HP’s Ignite-UX, and OpenSUSE Autoyast. These systems can, if configured properly, automatically install a machine unattended.

When combined with a tool like cfengine or puppet, these automatic installations can be nearly complete – from turning the system on for the very first time to full operation without operator intervention. This automated install not only improves reliability, but can free up hours of your time.