Using README Files for System Documentation

I’ve become a huge fan of README text files scattered throughout the filesystem of any managed systems. When creating directories for various purposes, such as SFTP-only users, home directories for non-person users, data directories, etc – the README will serve well to document the files and the directory in general.

There are many places you can use a README file for better clarity. Consider some of these locations:

  • NFS mount: where is the directory actually located?
  • Mountpoint root: what directory is this supposed to be normally?
  • Network shares: like NFS, describe where the directory is actually located – and why it is shared.

Lots of things can be described in a README file.

What are these files? This is the most obvious question to answer. Are they backup files? Are they one-time backup files? Are these files part of a project?

How long is the directory or files needed? Can the directory be eliminated at another time?

Where is the directory located? For network shares – and for disk mounts – labelling the directory in the README can help. This way, when the disk is mounted elsewhere (for recovery for instance) its actual mount point will be recorded. In the case of a network share, it helps to identify where the share is coming from without having to resort to other tools.

What should be done in the future? Putting a modest amount of future plans in the file helps to show future admins what was going on at the time the file was written.

When was the file written? The file itself should contain the date; otherwise, no one will know when the file was written. The filesystem metadata (file time) cannot be relied on as it can change for various reasons.

How does this configuration or set of files deviate from the norm? If you have a different set up, then a description of the configuration is in order. Otherwise, a future admin – or yourself – will not realize that things are different and may spend time figuring that out when they don’t have to.

Give an overview. Many directories will be comparable to large “containers” (like /data); describe what each of the directories are used for in a short paragraph or two.

When you go about creating a text file like this, remember several things:

  • Use text-only with no formatting. Don’t make it a WordPad document, nor an OpenOffice document. The document should be readable by a minimal environment.
  • Check clarity. It won’t do to go to this trouble and then find later that the writing is incomprehensible.
  • Use proper spelling, grammar, and punctuation. Reading something that requires basic fixes in grammar and spelling can be grating and is not good for your professional image.
  • Revisit the README file once in a while. Updating the file with new information will prevent mistakes and errors from creeping into the document, and will prevent future admins from relying on out-of-date information.
  • Make more than one if necessary. This is almost never necessary – but when it is, don’t be afraid to do so. Name the file README.something to help future readers.
  • Do your research. If you have questions about what a directory is for, what the files are for, or why things are the way they are, ask. Someone will have to do this one day anyway – so do it now and document it.

If you make a regular habit of sprinkling a little documentation like snow, you won’t regret it.

    A Statistical Analysis of Android Issues

    I was recently researching an issue with Android phones – the oft-requested ability to record a phone call – and found that this has been an issue in Android since March of 2009 and remains classified as “New.” In fact, most of the issues in the top 1000 (according to users) remains classified as new and has not been allocated to any developers.

    Thus, I started wondering about the various Android issues and their relative importance to users and developers. I downloaded the list of the top 1000 issues (in CSV format) – according to the number of stars – and analyzed these results.

    Here is what I found out:

    • 63% of issues are more than one year old (12% of issues are over two years old!)
    • 86% of issues are listed as “new” (instead of “assigned” or “needsinfo” or others)
    • 10% of issues are assigned to a person
    • Average age of issues is 431 days (1 year, 2 months)
    • Average defect age is 457 days (1 year, 3 months)
    • Average enhancement request age is 543 days (1 year, 6 months)
    • Reviewed items were the oldest on average, with reviewed defects at 550 days and reviewed enhancements at 749 days.

    When you rank the items by the number of stars (user importance) per day (age) some very interesting things come out. The most important issues in this ranking are the following:

    1. Change refund time in Android Market (issue #13116) – 32 days old
    2. Arabic Language Support (issue #5597) – 386 days old
    3. Nexus S reboots during an active call (issue #13674) – 8 days old
    4. Ability to limit Internet access (issue #10481) – 149 days old
    5. IPSEC VPN compatible with Cisco VPNs (issue #3902) – 484 days old
    6. Poor browser performance (issue #13404) – 19 days old
    7. Google Docs support on Android (issue #1865) – 714 days old

    These items show one of two things – probably both – that either what users think is important is irrelevant to Google, or alternately, that the items are acted on and the issues tracking list ignored. People commenting on the issues are routinely asking where the Google responses are.

    Another interesting item came up during statistical analysis: not one item (in the top 1000) which was listed as requested by a user or by a developer was listed with a status of Assigned or with a status of Reviewed. There were other items, but these were not listed as requested by either a user or a developer – and many of these were assigned or reviewed (or, indeed, Unassigned). I can only guess at the true meaning of this; it suggests that Google only acts when an issue comes from within Google.

    In all, this statistical exercise would have been much more exciting if it weren’t for the disappointing results. I did check the main page to see if Google’s main page for Android in Google Code was obsolete; no such statement was anywhere to be found.

    Do You Have a Data Retention Plan?

    If you don’t, your company could find itself having to save documents it would much rather have gotten rid of when a lawsuit occurs. More importantly, customer information is protected by law and not handling it with care can lead to significant and adverse consequences.

    Consider the tale reported over at the Clutter Diet blog. The company in this tale did not handle customer data properly at all.

    Shredding document isn’t enough either; companies will reconstruct the shredded documents for a hefty fee – even from cross-cut documents. In the New York Times (July 17, 2003) Doublas Heingartner reports about an effort to reconstitute hundreds of documents from the East German Stasi (or secret police).

    The best thing to do is to have a written and accurate plan for disposing of documents, and a method of disposal that precludes reconstitution. The US military now uses pulping and pulverizing of paper; it should be possible to do this with corporate documents in some fashion as well.

    A data retention plan should, of course, manage electronic documents as well. Sensitive documents should be deleted and the hard drive space wiped. If the hard drive is to be disposed of, physical destruction is the only way to completely be assured of total data loss; however, your company may very well be satisfied with a complete wipe of the drive with tools like Darik’s Boot And Nuke.

    Just do it. Your lawyers and customers will thank you.

    Contributing to Wikipedia: Getting Deep

    Writing is much more a part of system administration than most people acknowledge. A good writer will be more likely to produce quality documentation and presentations and other documents.

    Writing for Wikipedia can be one method that you can use to improve your writing (by seeing other’s writing, and by getting reviews from others). However, contributing to Wikipedia is also a way to give back to the community, a way to show appreciation for what Wikipedia provides.

    The most obvious is to just edit any article that needs it, and to improve its wording and spelling. However, there are techniques that are not as obvious in which you can participate in the growth of Wikipedia.

    First, there is the Wikipedia Community Portal – a sort of single point of entry for contributing to Wikipedia. This should be a starting point for anyone "going deep" into Wikipedia contributions.

    One can also join a Wikipedia Patrol. Wikipedia patrols watch over a certain type of Wikipedia page for problems and assist in making pages better.

    For example, one join the Recent Changes Patrol, watching the Recent Changes page (reloading every so many seconds for instance) for new edits, and check any that appear to need closer examination: edits that are from IP addresses, or that have no comments – or edits that come from users that are making a lot of edits rapidly.

    One could also join the Random Page Patrol, selecting a random page to improve.

    Another thing to do is to join a Wikipedia WikiProject. For instance, there is the Wikiproject Circus – which is focused on improving pages about circuses. (By the way – you really should visit the Circus World Museum in Baraboo, Wisconsin, sometime…)

    Contributing – and improving your writing and others – is not limited to Wikipedia: you can add your abilities to Wikipedia’s sister projects:

    Add your voice to Wikipedia!

    You can also work on similar sites that are not directly related to Wikipedia – such as WikiHow (how-to manuals) or others – but Wikipedia and its related sites are probably better because of their respectability, their focus towards writing (articles, etc) and their wide audiences.

    Building a Checklist

    When you are undertaking an invasive and complicated process, you should have a checklist to go by. This will help you make sure you cover all the bases and don’t forget anything. I’ve written about this before.

    However, how do you build a checklist that will be of the most assistance?

    First, “build” is the right term: in the days or weeks leading up to your process (system maintenance, for example), come back to the checklist over and over. Review it several days in a row, or better yet, several times a day. You’ll think of new things to add to it, and you’ll be fleshing it out until it is comprehensive and complete. You might want to leave it loaded in your workstation so you can come back to it whenever the mood strikes.

    Secondly, break the checklist down into major sections. For example, in patching a system you might have sections for: 1) preparing the system; 2) patching the system; 3) rebooting the system. Other processes will have different major sections. These major sections should be set apart on your checklist, preferably with titles and bars that segregate the checklist into its component parts. I recommend a different color background and a large bold font to set it apart.

    Thirdly, there should be a “point of no return” – which should be at a major section break. This is the point where you cannot turn back and return to the way things were. At this point during the process, you have to choose: have things gone smoothly enough that completion is likely – even inevitable – or is the process in such disorder and disarray that a return to the status quo would be better? At that point, one must choose.

    With such a checklist, your process will be much smoother, and you won’t have to explain to the boss why you missed something critical. It’ll also document what you did (along with the notes you take).

    User Interface Design: the Command Line

    The command line is not immune from user interface design. Especially with the concept of language, one has to choose carefully the options and names and orders of the things in order to make things work just the way the user expects them to.

    If the program is too different, people will be tripping over it all the time. The UNIX tar command comes to mind as one that failed here: options (or “actions”) specifically did not start with a dash. Likewise, UNIX find also failed: if you didn’t include the parameter -print at the end, you saw no output: your find command found nothing! (In reality, it just didn’t report it.) Both of these errors have been rectified in the last several decades: UNIX find has an implied -print, and tar often will make the dash optional – which makes it work both the way it always did and the way it should have.

    As an example of what seems to be a colossal user interface failure – including poor writing – consider these articles from Scott Remnant which are absolutely a gem (albeit from way back in February 2009). He wrote an article titled Git Sucks – which was then followed by a second and then a third – followed by yet another titled Revision Control Systems Suck.

    What Scott is railing about is how hard these systems are to learn (he targets not just git, but also GNU Arch and Bazaar). From his standpoint, he finds these systems to be complicated and hard to understand.

    He also points out (rightly) that the most common actions should be the simplest, and finds that with git these common actions are rarely ever simple. He specifically mentions reviewing the changes that someone else has made compared to his own – and says that there’s not a revision control system that makes it easy.

    An example of how user interface design can be incorporated into things like the command line and even programming is this quote from an interview with Yukihiro Matsumoto, the developer of the programming language Ruby about his guiding principle in developing Ruby:

    [It’s] called the “principle of least surprise.” I believe people want to express themselves when they program. They don’t want to fight with the language. Programming languages must feel natural to programmers.

    and later in the same interview:

    In addition, Ruby is designed to be human-oriented. It reduces the burden of programming. It tries to push jobs back to machines. You can accomplish more tasks with less work, in smaller yet readable code.

    Another example: I was just rereading my copy of The Humane Interface written by Jef Raskin. In it, he had a section titled Noun-Verb versus Verb-Noun Constructions (section 3-3, p. 59). This mirrors a problem I have experienced with some command line software in the past: the command wants an action as the first argument, and the object of the action second. I despised it enough that it was the genesis of my writing a wrapper for the command that reversed the order: object first, action second. Imagine my surprise to find my troubles validated right there in Raskin’s book.

    There are many examples of command line programs doing wrong things, and of programs doing right things. One of the right things comes from HP-UX and its software management tools such as swinstall: if the program can use an X display for a graphical display, it will: but if not, it goes to a text display instead.

    There are many such examples, of programs just doing what you need and leaving you to think about other things. I wonder what would happen if a company like Apple decided to tackle the command line – although, in a way, they did already. In MacOS X, consider the open command for instance… absolutely brilliant, which is in contrast to the open command sometimes found in other UNIXes (never standard).

    One very important point to remember: “It’s only hard until you learn it” is not a valid excuse. The learning curve for a program should not be any steeper than it has to be.

    OpenVMS and Network Information

    If you don’t know where to look, OpenVMS networking information can seem to be confined inside a mysterious black box. It doesn’t have to be.

    The ANALYZE command can provide a lot of good information. Be sure to have a large enough scroll-back buffer on your terminal when you do this:


    $ ANALYZE /SYSTEM
    SDA> SHOW LAN /FULL

    You can also find out a lot of good information in a hurry with the LANCP command:


    $ RUN SYS$SYSTEM:LANCP
    LANCP> SHOW CONFIGURATION

    You can also look up information using the TCPIP command:


    $ TCPIP
    TCPIP> ifconfig -a

    However, while this information is all good, it isn’t complete without marking the back of the computer in some way so that you know which port is which. If you have to, you can hook up a laptop with a network cable and watch the traffic: the DECNet clustering traffic is such that you’ll see it on every active interface – which provides you with the MAC address for that port.

    System Reboots Require These Tools and Practices

    When a long-running server needs to be rebooted, what are the most important tools? Remember, reboots on many systems can be weeks, months, or even years in between. So a reboot is not a normal occurence for the machine.

    So what would the best tools to have on hand? Paper and pen. Take extensive notes of everything that happens out of the ordinary as the system comes up – things to fix, things to watch out for, and so on. Recording how much time it takes may not be a bad idea. Watch for services that are not required and shut them down as needed.

    When debugging the reboot process, make sure to get evidence of a completely clean startup before considering the job done. The job may look like it is done, but if a reboot exposes a failure in configuration or other problems, then it’s not done – and you won’t know unless you reboot.

    Also when you reboot, make sure that all subsystems are up and running. Often, important subsystems are not set to automatically start up – in case the system crashes, the idea is to keep the system off-line until the reason for its demise is fully known. So don’t forget these important subsystems and start them up after booting – whether the system is Caché or Oracle or some other.

    6 Ways to Improve Documentation

    Documentation is very important, and can often get pushed to the back of the queue only to be forgotten. However, this only becomes a problem later when someone tries to figure out just what this thing is supposed to do.

    Here are six ways you can improve your documentation easily and quickly:

    Embed help into your scripts. When you write a script (Perl, ksh, Ruby, et al) provide a help screen that is printed when the script is run without arguments, or when run with one of the arguments -h or –help … or even -help or -? (though the first two are best). Also document the purpose and other details within the script itself.

    Write to your audience. Remember when you write who your documentation is. Most of the time, this will be a junior administrator (or someone who fits that description, title not withstanding). However, sometimes this will be an executive (which demands a different set of knowledge and context).

    Document in more places. When writing documentation for a script, write the documentation in all of the logical places (not just some). Here are some logical places where your documentation should be:

    • Inside the script (function descriptions, syntax, arguments, etc.)
    • During script operation (using -h or –help)
    • Man pages (man myscript
    • Info pages (info myscript)
    • (VMS) Help system
    • (Windows) Help files
    • System technical documentation (how does the script fit into the entire environment or system)
    • System overview (does this need to be changed?)

    Use paper. Create printed documentation – the system itself will not necessarily always be running. When you need help, it might not be available online. The best documentation will also include printed source as well as the general documentation.

    Encourage people to use it. If the documentation gets used, then it will have served its purpose well. It also allows you to ask people what they thought of it, and to improve it. This becomes a self-perpetuating review process for the documentation writer.

    Just do it! Okay, so that’s a stretch – but if it doesn’t get done, you won’t have any (now there’s a statement!). Some programmers, administrators, and analysts may think they don’t have the time. However, you must make time – if you don’t, the person trying to comprehend just what this thing does might be you.