Living with Statistics
Off the Beat: Bruce Byfield's Blog
One of the prices of software freedom is the impossibility of getting accurate figures for usage. As a user, I consider that a small price to pay for not having to register or activate software. However, as a journalist I'm often frustrated, because accurate figures can be useful for establishing a point or debunking rumors.
The questions for which I would like accurate stats include: how many GNU/Linux users are there? Has Linux Mint really overtaken Ubuntu as the most popular distribution? Has GNOME gained or lost users with the start of its third release series? All these questions and more would benefit from reliable figures, yet we don't have any. Instead, we have a series of indicators that are approximate at best, and completely unreliable at worst.
One problem is external biases. For example, when NetApplications places Linux usage at 1.6%, that total is derived "from the browsers of site visitors to our exclusive on-demand network of live stats customers." But when I consider that the same methodology based on visits to my personal blog would suggest a figure of 19% for Linux, I have to wonder if NetApplications' figures aren't as skewed as mine, but in the opposite direction.
Similarly, since NetApplications' headquarters are in California, probably American companies are most likely to use its services. Unofficially, I am always told that free software usage is lightest in North America, Microsoft's home, and higher in Europe or in developing countries.
However, other problems arise when I rely on sources that are more friendly to free software, such as Distrowatch's page views for distributions. My guess is that most people who visit Distrowatch are already familiar with free and open source software (FOSS), so that their figures reflect only reflect the tastes of relatively experienced users.
Yet even that assumption may be questionable. Page views might tell what distributions people are curious about, but that might be a rough indicator of what people are downloading and using.
Moreover, Distrowatch's numbers are small enough that a new release or a lively discussion elsewhere online can skew results for days or weeks at a time. A handful of fans might easily distort results, although nothing indicates that such an effort has ever been made. Armed with such doubts, you can easily dismiss Distrowatch figures altogether, as Canonical employee Michael Hall did when Distrowatch reported Linux Mint as receiving more views than Ubuntu.
User surveys share some of the problems of Distrowatch's figures, but also come with their own problems. For instance, FLOSSPOLS' survey of gender in the community frames all discussions of women's under-representation in FOSS. Yet the FLOSSPOLS data was collected seven to eight years ago, making it decidely obsolete, especially in a field that changes as rapidly as FOSS. Today, we have no idea whether the situation in the community is better than the survey reports (it could hardly be worse).
Still, at least the FLOSSPOLS survey was designed according to research standards. Community surveys, such as the Linux Journal's Readers' Choice Awards or the LinuxQuestions' Members Choice Awards can't even claim that. In both, participants are self-selected and answers are open ended. The number of participants may or may not be given, and margins of errors never -- although, if they were, they might be as high as five percent. If so, then in many cases where GNOME was declared the most popular desktop environment over KDE, or Mozilla the most popular web browser over Chrome, a more accurate result would probably be to declare a tie.
None of what I am saying is meant to be a reflection upon those who collect the data. With the exception of FLOSSPOLS and NetApplications, none of these sources has ever claimed to be providing scientifically reliable information. In some cases, entertainment is probably more of a motivation than anything else.
But for those of us in search of accurate information, the shortcomings of what is available are annoying, to say the least.
Living with Imperfection
So what's a writer to do? The high road would be to ignore such sources of information, and learn to live with uncertainty. As much as I want accurate information about FOSS, I might have to accept that it just doesn't exist.
However, that is hardly a solution. Even if I ignore these figures, others don't. Such sources as are available always being cited to support various arguments, and, if nothing else, I might want to debunk the argument with something more than the reasonable doubt of meta-arguments.
Besides, the issues that such sources touch upon are ones that I -- and many other people -- want to talk about. As limited as these information sources maybe, they at least give some context to discussions that would otherwise be even less uninformed.
As a result, the way I use these figures is an uneasy compromise. However, briefly, I try to indicate that they're not reliable. I try not to make arguments that depend on a couple of percentage points of difference.
Most of all, I try not to base an argument on any single set of results. If a survey gets the same results several years running, I'm more likely to trust the figures than if they appear in a single year. Better yet are times when more than one source shows similar results over several years.
Of course, if I was paranoid enough, I might worry about whether all surveys were being manipulated by a small group of users or corporate employees. Realistically, though, I think that, under the conditions I describe these statistical sources can indicate general trends to a degree that no other sources of information can. But I try not to forget that these sources are tentative, and can never be used with any precision.
Comments
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.
-
CachyOS Adds Support for System76's COSMIC Desktop
The August 2024 release of CachyOS includes support for the COSMIC desktop as well as some important bits for video.
-
Linux Foundation Adopts OMI to Foster Ethical LLMs
The Open Model Initiative hopes to create community LLMs that rival proprietary models but avoid restrictive licensing that limits usage.
-
Ubuntu 24.10 to Include the Latest Linux Kernel
Ubuntu users have grown accustomed to their favorite distribution shipping with a kernel that's not quite as up-to-date as other distros but that changes with 24.10.
better Linux usage stats
http://stats.wikimedia.org/...quidReportOperatingSystems.htm