A Statistical Analysis of Android Issues

I was recently researching an issue with Android phones – the oft-requested ability to record a phone call – and found that this has been an issue in Android since March of 2009 and remains classified as “New.” In fact, most of the issues in the top 1000 (according to users) remains classified as new and has not been allocated to any developers.

Thus, I started wondering about the various Android issues and their relative importance to users and developers. I downloaded the list of the top 1000 issues (in CSV format) – according to the number of stars – and analyzed these results.

Here is what I found out:

  • 63% of issues are more than one year old (12% of issues are over two years old!)
  • 86% of issues are listed as “new” (instead of “assigned” or “needsinfo” or others)
  • 10% of issues are assigned to a person
  • Average age of issues is 431 days (1 year, 2 months)
  • Average defect age is 457 days (1 year, 3 months)
  • Average enhancement request age is 543 days (1 year, 6 months)
  • Reviewed items were the oldest on average, with reviewed defects at 550 days and reviewed enhancements at 749 days.

When you rank the items by the number of stars (user importance) per day (age) some very interesting things come out. The most important issues in this ranking are the following:

  1. Change refund time in Android Market (issue #13116) – 32 days old
  2. Arabic Language Support (issue #5597) – 386 days old
  3. Nexus S reboots during an active call (issue #13674) – 8 days old
  4. Ability to limit Internet access (issue #10481) – 149 days old
  5. IPSEC VPN compatible with Cisco VPNs (issue #3902) – 484 days old
  6. Poor browser performance (issue #13404) – 19 days old
  7. Google Docs support on Android (issue #1865) – 714 days old

These items show one of two things – probably both – that either what users think is important is irrelevant to Google, or alternately, that the items are acted on and the issues tracking list ignored. People commenting on the issues are routinely asking where the Google responses are.

Another interesting item came up during statistical analysis: not one item (in the top 1000) which was listed as requested by a user or by a developer was listed with a status of Assigned or with a status of Reviewed. There were other items, but these were not listed as requested by either a user or a developer – and many of these were assigned or reviewed (or, indeed, Unassigned). I can only guess at the true meaning of this; it suggests that Google only acts when an issue comes from within Google.

In all, this statistical exercise would have been much more exciting if it weren’t for the disappointing results. I did check the main page to see if Google’s main page for Android in Google Code was obsolete; no such statement was anywhere to be found.

A Single Character Causes Downtime for… WordPress.com!

Last Thursday, an error in the wordpress.com software caused some user settings to be overwritten, which resulted in loss of settings for some customers. The site was taken down for checks, and an hour later, 99% of users were back online.

The cause of the error? A coding error of a single character. Certainly checks and balances are needed, but according to Matt Mullenweg, founder of WordPress.com, they are already using reviews and testing.

It was less than a month ago that Toni Schneider, CEO of Automattic, wrote in glowing terms about the use of “continuous deployment” at wordpress.com. Is this event going to lead to the death of “continuous deployment” at WordPress? I suspect not.

In fact, Paul Graham described in a paper how he used Lisp for Viaweb in just this fashion. Viaweb was bought by Yahoo! and became the Yahoo Store. Viaweb would fully implement features before it had even become mainstream.

Let this WordPress.com downtime be a lesson as to what a single character can do, and also a lesson in how none of us are immune from such mistakes.

Saving Face and Customer Service

Not too long ago, there was a posting in Sharky’s Column, a column of the dumb things that can happen in IT, about an IT person’s experience with a very cheerful and helpful strong man.

I don’t want spoil the story too much, but the response by the IT person helped the “gentle giant” save face and he didn’t even know it. This story is a good example of how we can offer excellent service to our customers while keeping them out of trouble at the same time.

When we can keep our customer happy we should – even if it means that the “cable was faulty”…

Customer Service More Important Than Ever

Sometimes, a company will have a customer service failure; it happens every day. However, with the advent of the Internet, and especially services like Twitter a customer service disaster can have some serious fallout.

Consider what happened to Virgin America recently. They had a standard (but unusual) misstep for an airline: the aircraft was stuck on the airport tarmac for 6 hours. That was notable enough; however, what was most unusual was the fact that on board was David Martin, CEO of a social network startup (kontain.com) that enhances user’s experience of Twitter and Youtube; he took the opportunity to document the entire ordeal by using the airplane’s wifi to update his Kontain.com account. The story was picked up by CNN, the New York Post, CBS, and ABC, and followed up by the blog Technically Incorrect and the blog AeroChannel.

Alson on board was “Dancing With the Stars” judge Carrie Ann Inaba, who sent Tweets about her experience. Another passenger, Uana Coccoloni, posted her experiences to Facebook as it happened. David Martin also posted a video of the experience that can be seen at ABC.

The story began with Virgin America Flight 404 into John F. Kennedy Airport; the flight was forced to divert north to New York’s Stewart Airport. After landing, the passengers were twice offered a chance to disembark; about 20 took the chance to leave. After six hours on the aircraft, the captain was able to contact a competitor with service in the terminal – JetBlue – to take the passengers to JFK airport by bus.

Also quite notable is the fact that had this event happened just a few weeks later, it would result in a fine of over US$3 million. A US law (14 CFR 259) is going into effect on April 29 that will fine the airlines US$27500 per passenger if passengers are kept waiting more than three hours. Several airlines have requested waivers for the airport at JFK (where this catastrophe started) because of this.

In this age of constant Internet presence, and being able to reach thousands easily and quickly, companies (and employees) need to be prepared for an instant backlash to customer service failures like this one. Virgin America responded later by sending letters of apology to passengers of flight 404 as well as a ticket refund and a $100 gift towards future travel. Unfortunately, the initial response by Virgin America was just the $100 ticket towards a future flight; the refund was David Martin’s idea (and Virgin America’s CEO readily agreed). However, this response is too late. Next time these passengers have to fly, will they think of Virgin America or JetBlue?

What should Virgin America have done? What would you have done?

I would posit that some sort of “first responder” group needs to be created that could respond to customer service issues large and small, and that the personnel “on the ground” should be able to respond as needed. The First Responder Group could have notified as soon as the fight was diverted and then had buses waiting at Stewart to take passengers to JFK.

The airport itself (Stewart International) could have had a First Responder Group that would be practiced and ready to go for just such an incident. There could be buses on standby and pagers distributed to appropriate responders, with appropriate responsibility given to them.

Why does it take an act (14 CFR 259) of the US Congress to force appropriate customer service from the airlines?

The final insult? After the plane was cleared, it finally was able to take off for JFK – and beat the passengers that were travelling by bus. Thus, the passenger’s original flight was already at JFK when they arrived.

Why Internet Explorer 6 Refuses to Die

Internet Explorer 6 was one of the ways that Google and many other companies were attacked recently. Web developers have hated it for a long time because of its lack of stability and lack of standards support.

IE 6 is the default browser shipped with Windows XP, and routinely is placed into lists of one of the worst technical products ever. Google announced in January that they would stop supporting IE 6 (which means YouTube will no longer work in IE 6). The French and German governments strongly advised (link in French) against using Internet Explorer in January 2010, in part because of security risks in IE 6. There are campaigns everywhere advocating against the use of IE 6.

So why is it still alive and supported by Microsoft? Over at the IT Expert Voice, one writer was determined to find out. The article is very interesting, and listed a number of reasons that IE 6 is still being used in spite of it all:

  • Upgrades comes slowly. If you upgrade your systems on a three to five year cycle, then IE 6 is very likely still present on the network.
  • A critical application requires IE 6. This is quite unfortunate, but happens often enough. If the vendor hasn’t converted to a more standards-compliant environment, the users can’t either.
  • “If it isn’t broke, don’t fix it.” This is almost a “head-in-the-sand” approach – or an extreme reluctance to upgrade at all. Hopefully, this is not common.
  • Using IE 6 can limit users to more appropriate sites. This reason is also incomprehensible: certainly the more popular sites will fail to work in the future with IE 6 – but IE 6 is also a security risk and more and more work-related sites will stop using IE 6 as well. I can’t imagine anyone would seriously use this as a reason to keep IE 6 – but apparently some have.

CNet also had an interesting article about why Intel continues to use IE 6; it is an excellent read.

Error Messages: Who Needs ‘Em?

There was a very interesting post over at Slashdot where a reader asked the gathered populace how to get users to read error messages.

The question, to some, might sound trite or like the standard complaining from technical support staff – and yet, this is a real problem and is wrestled with by technical staff and by usability engineers as well.

There were a number of interesting responses (and thoughts) to this question:

  • Force users to read it. In one case, the support staff put a code within the text of the error message, and required entry of the code to continue. Users who called saying they were stuck were told to read the error message to the support staff – at which time, the user typically said “Oh, never mind…” Alternately, another story told of a site that stopped people cold at an error message, requiring them to call technical support. If they tried to bypass the message with a reboot, the offending application would not work for 15 minutes.
  • Start using error messages appropriately. This requires more work by developers, but the idea is that users are so used to error messages that are meaningless that they ignore them entirely.
  • People will filter out things that are unexpected or don’t fit the model. This is backed up by research (apparently) of fighter pilots who, during a simulation, overwhelmingly blotted out the fact that there were huge items in their path on the runway. In another famous case, people were given something to watch (a conversation I think it was) and no one reported the huge gorilla (yes!) that wandered through the picture.
  • Pictures are more easily noted and remembered. Some suggested using pictures – one or two said they had done this with excellent results.

What do you think about error messages? How do you get users to read them?

Email Productivity: Smack Down that Email!

I believe I have a somewhat unusual approach to email – at least, unusual in that it doesn’t seem to be discussed much. It works for me, and might just work for you.

I get a ton of emails – mainly because I either a) have notices and warnings and logs coming from systems I manage, or b) subscribe to way too many newsletters, mailing lists, and so forth. At work, I get notices; at home, I get mailing lists…

This is what I do.

Sort everything!

If you can quantify it, put it into a folder. Nothing should be in your inbox except mail you’ve not had a chance to quantify yet – or haven’t seen before.

Create rules to sort things automatically. This is the crux of the system: everything is sorted as it comes into your mailbox. Also, if necessary, force the rules to sort only once: once the rule is triggered, it should quit and stop processing rules. Thunderbird does this automatically; Outlook has to be told.

As you create the rules, most email clients will allow you to create a folder at the same time. Use this capability.

Many clients also have the ability to create a rule from a message – sometimes even to the point of automatically creating a filter on a sender or on a mailing list sender: use it. Both Thunderbird and Outlook will provide much of this capability from a right click on the message to be sorted.

Also remember to apply the rules as you create them to all messages currently in the inbox: that is the whole purpose. Before the rule was created, they couldn’t be sorted – so sort them afterwards.

Here are some examples:

  • Mail from the boss. Move it to a folder with his name.
  • Mail from the system administration mail group. Put into a folder named according the the group’s name.
  • Newsletter from a system manufacturer. Move to a folder named according to the newsletter name or the manufacturer’s name.
  • Automatic log messages sent by mail from a system. If these are “alarm” type messages, separate them. System messages could go into a folder named after the system, or into a folder according to the monitor tool reporting.

The last example brings up the next point:

Use saved searches to sort in different ways.

For example, all automated messages from a system could go into a folder by system name. Then created saved searches that show all messages from a particular monitoring system (such as Nagios or HP’s EMS).

Add alarms for vital mail.

In contrast to what others have said, I believe in message alarms: however, only use them for mail that is truly important. For example, when the boss sends you an email, you’d better look it, yes? Likewise, if you are responding to help desk tickets, you’d better know about it right away.

The general suggestion still holds however: turn off global message alarms!

Change view of inbox to only show unread mail.

This is how I achieve Inbox Zero (I cheat!). I do still create rules as much as possible for everything that comes in – but there are stragglers.

Create a list of favorites.

Lastly, create a list of favorites. Outlook allows you to mark a folder as a favorite; KMail has a similar capability. This provides you with a way to sort everything but only see (directly) what is most important.

Preventing Problems (or: How to Appear Omiscient to Your Users!)

When a user comes to you with problems that they are experiencing with one of the servers you manage, what is the first thing that goes through your mind (aside from “How may I help you?”). For me, there are two: “How can I prevent this from happening again?” and secondly, “Why didn’t I know about this already?”

Let us focus on the second of these. If a user is experiencing problems, you should already know – yes, you really should. If the server is down, overloaded, or lagging behind, these are the sorts of things you should already know.

Most servers leave messages in the system syslog or other log files; write or use something that will scan the log files for appropriate entries and send you a warning. SEC (Simple Event Correlator) is one of the best at this.

Another tool that is invaluable for this is Nagios or other monitoring software such as Zabbix or Zenoss. With such software, it is possible to be notified when a particular event occurs, an actual threshold passed.

When a tool like Nagios is combined with SEC, then much more powerful reporting is available. For example, if a normally benign error (ugh! Who said errors were normal?) occurs too many times in a period of time, then the error can be reported to the Nagios monitoring software and someone notified.

Other tools provide system monitoring with time-related analysis. For example, if disk utilization is too high for too long, a warning can be issued. Another example: if too many CPUs average more than 60% utilization for the last 30 seconds, someone could be notified.

HP’s GlancePlus (a part of OpenView which comes bundled with 11i v3) and the now open source tool Performance Co-Pilot (or PCP) from SGI are two that provide these capabilities. They support averaging, counts per minute, and many, many more. PCP comes with support for remote monitoring, so all systems can be monitored (and data archived) in a central location.

Again, these tools can be integrated with SEC or Nagios to send out notifications or post outage notices and so forth.

With tools like these in your arsenal, next time someone comes to you with an outage or sluggish performance complaints, your response can be: “Yes, I’m already working on it.” Your users will think you omniscient!

Helios Linux Attacked as Illegal Enterprise

I saw this article from Ken Starks, the maintainer of the Helios Linux distribution, about a letter he received. It is from a teacher who confiscated a number of live Linux CDROMs from a student and then accused the Helios maintainer of illegal activities. The teacher’s letter is astounding in its misunderstanding of the true nature of open source.

Setting aside the audacity and ignorance of the teacher for this article…. It goes to show that not everyone is as well-informed as many of us. The teacher in this case perhaps has never heard of Edubuntu, a distribution formed just for education – nor of OLPC, a nonprofit organization trying to get laptops (Linux laptops mind you) into the hands of all of the children of Africa and the third world.

We must be prepared for educating our supervisors, users, and others that rely on us as to why this or that open source project is worthwhile. In many cases, the fact that a product is open source (or not) is not a selling point: many folks will not use something because it is open source, but would rather pay for something which is better – or meets their needs – or is “what everyone uses.”

Examples of this abound: Linux v. Windows – Linux v. UNIX – Red Hat Enterprise v. CentOS – OpenOffice v. Microsoft Office – OpenSSH v. SSH – GNUCash v. Quicken – and more. Put aside the open source nature of the product and explain why it is better than the commercial product. Does it have more features? Does it work in more places? Is it easier to use? Does it cost less? (Okay, the last is not free of the open source movement – but freeware is there too…) Does it have a lighter footprint? Is it more widely used than the commercial product?

All of this must be explained to those who have no idea what open source is about – and perhaps have no technological background, much less an understanding of technical history.

Let’s get out there with our heads held high and educate the masses!

Update: this story has a happy ending. I’m also glad he didn’t name the teacher involved, and I can just imagine the vitriol that flew his way. The fact that he stood his ground speaks tremendously to his character. Kudos, Ken!

Working in a development environment

In a standard business environment, a production system is one that must be up and stable, and cannot be changed without a lot of forethought and a lot of getting people to coordinate and okay the process. A development system is one that the administrators use to prepare for bringing systems into production.

However, if your users are developers, then things may be different – especially if you are also using the software in a stable environment.

Development, by its nature, produces unstable code which is prone to crashes and other undesirable behavior. This stands the usual system administration goals on their head: your systems, though they are in “production” (that is, they are used by normal users on a daily basis) – these “production” systems behave like test systems in that they are not reliable. With reliability issues, it may seem as if they are not production systems – but they are.

What’s more, there may be actual “production” systems – systems with the same software which is not being developed, but being used. These systems then, are also systems that should not change (in production, we would say), but do not have reliability problems.

Even though the development environment may feel like a test lab at times, with systems going experiencing hangs and so forth, these systems still need to be treated like a normal production system. Never forget that your users, even though they seem to do “bad” things to the system, still rely on the system being there on a daily basis.

It also means that you will have to respond to problems faster, and be proactive in preventing problems – and that you will have more problems to resolve.

In short, the normal software development environment is more challenging to the admins that support this environment – but also more exciting.