Preparing code for 64-bit ARM
"maddog" and Linaro are collaborating on a contest to improve the performance of certain GNU/Linux source code modules.
Last month, I wrote briefly about a project I am doing for Linaro, which is an association of ARM processor companies that collaborate to achieve better support for Linux on ARM systems. The original project was to help with approximately 1,400 source code modules in the GNU/Linux system containing assembly language and to make sure these modules were ready for the 64-bit version of ARM processors now appearing from various manufacturers.
Steve McIntyre, a friend of mine who works for ARM directly, lwocated the 1,400 modules and analyzed some for content and difficulty in "porting." Steve has had a lot of expertise with this because he has been a long-time kernel programmer and was the Debian project leader for a while.
Steve noticed that while some of the modules were very good, in some the 32-bit assembly language had been added in a sub-optimal way, not taking advantage of some of the ARM features. In other cases, the assembly language was doing tasks (i.e., identification of hardware) that would be better performed in a compiler "intrinsic," which could then be used by every machine architecture in a consistent way.
As I started to analyze the task, I also realized that some of these modules had been written quite a long time ago, and although they had been maintained – with bugs fixed and ports done – the underlying design was for smaller, more expensive memory and slower single-core processors. Likewise, Steve only analyzed the source code of the module. He did not look at the compiler options used in the build of the software. In some cases, the flags used on the compiler did not take advantage of the newer optimizations of the compilers.
Taking larger and cheaper memory into account, as well as multicore processors, it is possible that code that was very efficient 10 years ago could use some analysis and performance improvements today.
After talking with the Linaro association, I decided to turn the "porting project" into a "performance contest" to see if we could get some performance improvement out of these modules.
For example, Digital Equipment Corporation (DEC) has a very good math library into which we had invested a lot of money. DEC was willing to contribute the binary to the Alpha/Linux project for free but refused to expose the source code of the library for fear that their competitors would simply copy it. Of course, the GNU/Linux community wanted the source code, and they hounded me for it.
Eventually, I replied, "If you are such good programmers, write a better math library." Three or four days later, an email announced that the
sin(3) function was "2% faster" and later
cos(3) was "1.5% faster." Day by day, the FOSS people changed the source code of the math library to make each and every subroutine faster than the one in the DEC proprietary code. Only one subroutine was never written to execute faster, and that was because no one used it … no one cared.
A second example was a reduction of memory footprint for the DEC UNIX (née OSF/1) operating system that allowed it to boot and run in 32MB of main memory instead of 64MB (yes, you read that correctly). After the work on it, the OS not only ran in less memory, it ran 7% faster on the same hardware because of better cache utilization simply by reducing code size and improving how the cache was used.
Therefore, I am starting a contest not only to ensure GNU/Linux code works on a 64-bit system, but also to see how much we can improve the performance and maintainability of the code in various ways.
Linaro has agreed to be the main sponsor of this contest to improve the code. They will identify potential code candidates and ask the contestants to measure code performance before and after modification. We will ask participants to document their work on the code and the algorithms they might have considered and changed and why. And, of course, we will be working with the upstream developers and module maintainers to make sure the improvements are acceptable.
We also will be looking for people to write various compiler intrinsics and make them available to the FOSS community, and we will be looking for mentors and judges. Finally, we will take the output of this contest and publish it. I hope to gather enough material for a course in code optimization that will be freely available.
I don't have enough room here to explain all the rules and options for this contest, so I will cover those issues online in my blog .
Buy this article as PDF
A major setback for the Linux desktop.
Improved support for GPU in virtualization.
News site for the openSUSE community falls victim to a Wordpress exploit.
The source code is available online.
One out of three virtual machines on Microsoft Azure Cloud run Linux.
The form factor of the board makes it a drop-in replacement for Raspberry Pi.
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.