A Statistical Analysis of Android Issues

I was recently researching an issue with Android phones – the oft-requested ability to record a phone call – and found that this has been an issue in Android since March of 2009 and remains classified as “New.” In fact, most of the issues in the top 1000 (according to users) remains classified as new and has not been allocated to any developers.

Thus, I started wondering about the various Android issues and their relative importance to users and developers. I downloaded the list of the top 1000 issues (in CSV format) – according to the number of stars – and analyzed these results.

Here is what I found out:

  • 63% of issues are more than one year old (12% of issues are over two years old!)
  • 86% of issues are listed as “new” (instead of “assigned” or “needsinfo” or others)
  • 10% of issues are assigned to a person
  • Average age of issues is 431 days (1 year, 2 months)
  • Average defect age is 457 days (1 year, 3 months)
  • Average enhancement request age is 543 days (1 year, 6 months)
  • Reviewed items were the oldest on average, with reviewed defects at 550 days and reviewed enhancements at 749 days.

When you rank the items by the number of stars (user importance) per day (age) some very interesting things come out. The most important issues in this ranking are the following:

  1. Change refund time in Android Market (issue #13116) – 32 days old
  2. Arabic Language Support (issue #5597) – 386 days old
  3. Nexus S reboots during an active call (issue #13674) – 8 days old
  4. Ability to limit Internet access (issue #10481) – 149 days old
  5. IPSEC VPN compatible with Cisco VPNs (issue #3902) – 484 days old
  6. Poor browser performance (issue #13404) – 19 days old
  7. Google Docs support on Android (issue #1865) – 714 days old

These items show one of two things – probably both – that either what users think is important is irrelevant to Google, or alternately, that the items are acted on and the issues tracking list ignored. People commenting on the issues are routinely asking where the Google responses are.

Another interesting item came up during statistical analysis: not one item (in the top 1000) which was listed as requested by a user or by a developer was listed with a status of Assigned or with a status of Reviewed. There were other items, but these were not listed as requested by either a user or a developer – and many of these were assigned or reviewed (or, indeed, Unassigned). I can only guess at the true meaning of this; it suggests that Google only acts when an issue comes from within Google.

In all, this statistical exercise would have been much more exciting if it weren’t for the disappointing results. I did check the main page to see if Google’s main page for Android in Google Code was obsolete; no such statement was anywhere to be found.

Statistical Analysis is Valuable for Understanding

In System Administration – and many other areas – statistics can assist us in understanding the real meaning hidden in data. There are many places that statistical data can be gathered and analyzed, including from sar data and custom designed scripts in Perl or Ruby or Java.

How about the number of processes, when they are started, when they finish, and how much processor time they take over the length of time they operate? Programs like HP’s Performance Agent (now included in most HP-UX operating environments) and SGI’s fabulous Performance CoPilot can help here. In fact, products like these (and PCP in particular) can gather incredibly valuable sorts of data. For example, how much time does each disk spend above a certain amount of writing, and when? How much time does each CPU spend above 80% utilization and when?

Using statistical data from a system could, with the proper programming, be fed back into a learning neural network or a bayesian network and provide a method of providing alarms for stastically unlikely events.

There are other areas where statistical analysis can provide useful data than just performance. How about measuring the difference between a standard image and a golden image based on packages used? How about analyzing the number of users that use a system, when they use it, and for how long? (Side note: I had a system once that had 20 or 30 users that each used the system heavily for one straight week or two in every six months… managing password aging was a nightmare…)

There are many places for analyzing a system and providing statistical data; this capability, however, has not been utilized appropriately. So what are you waiting for?

Firefox 3.5 Market Share and Statistics

Recently, a lot of folks have reported that Firefox 3.5 is the most widely used browser. This is true; however, the statistics do not show that Firefox is most widely used (which most responsible blogs have also reported).

Geeksmack had an excellent article on this topic. Looking at the graph shown there, there are many things that can be seen if you examine the graph critically:

  • The loss of market share by IE 7 seems to correspond to the growth of IE 8.
  • Firefox 3.0, at its height, had a larger percentage of users than Firefox 3.5 does right now.
  • Firefox 3.5 seems to be affecting Firefox 3.0 the most: after Firefox 3.5 was introduced, Firefox 3.0 dropped in user count precipitously.
  • In the middle of 2009, Firefox 3.0 lost users to IE 7 for a period of time.

The last thing is that these are all conjectures based on statistical evidence; true correlation may not exist. Truly, statistics must be analyzed with care.

There is a good article over at Blog of Metrics at mozilla.org describing all the places where one can find the current statistics on Firefox market share.

Getting Passwords from Random Data (portably!)

Over at Mark Kolich’s blog, he wrote several months ago about using a source of randomness (/dev/urandom) to generate passwords. The idea is simple enough: take the random data, strip out only the printable characters, and then print the desired length of characters for a password.

Shortly thereafter, he described how to use a simple shell script to generate many passwords – such as for setting up many different accounts.

Working with HP-UX and OpenVMS as I do, I immediately thought: how could I do this in Perl, making the idea portable and making a program that will work on both UNIX and OpenVMS? It was easy – and easy to make it flexible as well. Here is the program that I came up with:


# code released by David Douthitt into the public domain

use Getopt::Long;

GetOptions( 'l=i' => \$opt_l,
            'p=s' => \$opt_p,
            'm=i' => \$opt_m );

$pat{"ext"} = "[[:alnum:][:punct:]]";
$pat{"alnum"} = "[[:alnum:]]";
$pat{"alpha"} = "[[:alpha:]]";
$pat{"simple"} = "[a-km-z2-9]";
$pat{"normal"} = "[a-km-z2-9A-HJ-NPR-Z]";

if (defined($opt_p)) {
   if (defined($pat{$opt_p})) {
      $pat = $pat{$opt_p};
   } else {
      print "undefined pattern!\n";
} else {
   $pat = $pat{"normal"};

$max = (defined($opt_m) ? $opt_m : 1000);
$len = (defined($opt_l) ? $opt_l : 6);

$x = $len;

for $i (0..$max) {
   $c = chr(int(rand(255)));
   if ($c =~ /$pat/o) {
      $s .= $c;
      if (--$x == 0) {
         print "$s\n";
         $x = $len;
         $s = "";

Note that since OpenVMS does not use the “#!” notation, that this line will be ignored as a comment and the program needs to be invoked via direct invocation of perl itself.

As an aside, Mark says how he prefers random passwords. Me, I prefer “pronouncable” passwords – still random, but using phoenemes which makes the generation process just that more complicated – and complicates internationalization. Apple’s MacOS X comes with a password generator that can generate random and pronouncable passwords.

However, with the proper password storage system a fully randomized password is good – or is it? A completely random password of eight characters could be zzzzzzzz as much as anything else. Perhaps a password with a random distribution of characters (rather than a random selection of characters) would be better. I’m not aware of any password generators that guarantee a random distribution instead of a random collection.

Powered by ScribeFire.