We are not programming in 1991 anymore!


Paw Prints: Writings of the maddog

Jul 07, 2014 GMT
Jon maddog Hall

As I write this I am also copying a talk given in February of 1996 at Digital Equipment Corporation (DEC) about the port of Linux to DEC's Alpha AXP processor.

It is interesting to hear Linus Torvalds and other people talking about spending three thousand dollars (or more) to buy a “high-end” PC, and to have that PC consist of a 32-bit address machine with eight megabytes of main memory, two or three gigabytes of storage on a hard drive and disk transfer rates of two megabytes per second.

Linus talks about the “Big Kernel Lock” and how this issue was not too important since Linux was aimed toward “low end systems” and most of those systems did not have multiple CPUs per board nor (at that time) multiple cores per CPU.

Another recurring theme is the lack of optimization and performance of the GNU compiler suite versus the “commercial compilers”.

In this time frame (and even before) some of the earliest pieces of GNU/Linux code were written.

Today we have 64 bit virtual address space, CPUs that have multiple threads, memories that are Gigabytes in size (and much less cost), disks (and controllers) that are much larger and faster, and the GNU compilers are very good with optimization.

Finally the GNU/Linux system itself was much simpler back in the days when some of these programs were written or ported to the platform.  Many of the APIs and system facilities we enjoy today did not occur until later in the life of the Linux kernel.

In a lot of the GNU/Linux distributions such as Ubuntu and Fedora, there are approximately 1400 programs that have assembly language in them. This assembly language was sometimes inserted in the program a long time ago, reacting to the slower CPUs, smaller memories, and less optimal compilers.

When CPUs were single core (and the systems were not SMP), assembly language was relatively straight forward, but when you start to have multiple cores it becomes much more difficult to code in assembly correctly. Compilers can keep track of the data and data-flow in a parallel environment much more successfully than the typical programmer using assembly language.

Some of these modules were written so long ago that the first assembly language used was either IBM 360/370 or DEC VAX architectures. Over time these modules were ported to other architectures, but the assembly language “port” was poorly done and instead of using instructions that would have been optimal for the new architecture, tended to match up the instructions of the existing assembly language to that of the new machine, often causing a less-than-optimum solution. In other cases upper-level code was created as a “fall back”, but the existing assembly language code was left in-line, not taking advantage of the multi-core capabilities of the compiled and optimized code.

In addition to all of this, in days gone past the sole criteria of a good program might be speed of execution or size of the memory in which it runs. These days another criteria has emerged, that of efficiency. How much electricity does your server need? How much cooling will you need for your server farm? How long do you want the batteries in your phone to last? These are other needs that affect programs being written today.

Now it is the twenty-first century, and ARM is designing a new 64-bit chip which will need these 1400 modules ported to them, or at least certified that the code will work on their 64-bit chips. This is a perfect time to:

  1. make sure that the 1400 modules run on ARM-64
  2. remove the old, crufty assembly language from other architectures whenever possible
  3. look at new algorithms what could be used with larger memory sizes (while still maintaining sensitivity to embedded system applications that require smaller footprints).
  4. Look at compiler intrinsics, add new libraries or make changes to old libraries that would eliminate having redundant code scattered throughout the operating system
  5. Think about how these programs might be operating in different environments

Linaro, an association of companies that build ARM chips has been working on making sure that GNU/Linux works well on ARM's new 64-bit architecture. In doing so they created a contest to have members of the community help with porting and certifying the existing modules of GNU/Linux.

In setting up this contest, however, the real issues behind these ancient pieces of assembly-language ridden code came to light, and Linaro extended the contest to try and help GNU/Linux to be more efficient and more portable.

There are now two parts to the contest.


One part has to do with porting and verification. Contestants are encouraged to first register at our site, then select a piece of code to work on from the list of code modules at the web site (http://performance.linaro.org). If that code compiles and works on ARM-64, then the module can be marked “ported” and the contestant is authorized to receive an “entry prize” of a Linaro T-shirt, as well as having the glory of having worked on the GNU/Linux operating system.

If the module does not work, then the contestant should file a bug against the module and start the process of fixing the code so it works on ARM-64. This could be done by writing ARM-64 assembly code or writing a fall-back set of higher-level “C” code that would not only work for ARM-64 but for other architectures as well. After confirming that the module works at least as fast on the various architectures that are supported and ARM-64, then the contestant should submit the patch to the maintainers of the code and mark the module as “patched” at the Linaro site.

The more modules that you test and patch, and the earlier in the contest you do the work, the more likely you are to win the grand prize of an all-expenses paid trip to a Connect meeting to be held in the United States of America or Asia, depending on the time of year. Please see the official contest rules at the web site for more details.


Another part of the contest is dedicated to improving the performance of GNU/Linux. While getting the code to work on ARM-64 is important to Linaro, so is the goal of having GNU/Linux perform very well on every architecture.

Linaro recognized that a lot of the modules which used assembly language used it because that particular part of the code was very critical, and the compilers of the day were not as efficient as small amounts of assembly could be. However, for reasons stated previously, Linaro feels that these modules (and the options on their compile lines) might be examined again to see if they might be made more efficient or run faster.

In this case the modules may or may not have first been ported to the ARM-64 architecture. If they have not been ported, then Linaro would assume that the performance work would be done in such a way as to make sure the code works on ARM-64. However, ARM does not want this work to penalize any of the other existing architectures or environments for this code, so contestants are strongly encouraged to discuss their plans of enhancement with the existing upstream maintainers/developers to see if the contestant's ideas match up with what the maintainers/developers have envisioned for the code.

After the contestant has obtained a “go ahead” from the upstream maintainers/developers, they should measure the performance of the code on various architectures, do the optimization, then measure the performance again. These measurements (and perhaps input and output data) may have to be submitted to the contest site, and in every case a report of how much more efficient the code is, the work done, and an affidavit stating that the code was accepted by the upstream maintainers/developers will be mandatory for the contest submission.

As with the “porting” part of the contest, the first thing the contestant should do is go to the site (http://perormance.linaro.org), sign up for the contest and choose one of the code segments to work on from the 1400 listed.

Every performance update completed will also be entered into the porting part of the contest to win a trip to Linaro's Connect meeting. However there will be a second way for the performance person to win. Twice a year the contestant with the greatest percentage of performance improvement in their code module will also win a free, all-expense paid trip to Connect.

Over the next couple of months, examples of code speedups, new algorithms, and ways of improving code (including some “classics” from maddog's own history) will appear here in maddog's blog.

We hope this will be an exciting, educational and useful exercise for people that wish to join the GNU/Linux programming community.

Welcome aboard!

comments powered by Disqus