Improving performance of Linux on ARM
"maddog" looks at some of Linaro's efforts to improve GNU/Linux performance on ARM architectures.
For the past several months, I have been working with Linaro , an association of companies who want to see GNU/Linux working well on ARM architectures. Although ARM Holdings designs the ARM architecture chips, various other companies manufacture the CPUs, GPUs, and SoCs (Systems on a Chip) from ARM's licensed designs. Some of these companies use these manufactured units in their own products, and some sell the manufactured units to other companies and to the general public. For the past couple of years, ARM has been working on a 64-bit chip, and their licensees are getting close to having ARM 64-bit hardware ready.
One of the ARM engineers determined that 1,400 different source code modules in either Ubuntu or Fedora (or both) have assembly language in the code. This is not to say that the assembly language (or lack of it) will stop the module from working on the ARM64 system, because there may be higher level fallback code (e.g., code written in C) that will take over and be compiled for the missing ARM64 assembly language. However, the modules have not been tested and verified either on actual hardware or on the emulators for the ARM64 architecture that currently exist. Thus, Linaro decided to enlist the community in porting some of these modules and has created a contest with prizes for those people who help out .
The engineers also noticed that a lot of the code containing assembly language was fairly old. It was designed in an age when systems had a single CPU; CPUs were much slower, with a single core; memory was measured in megabytes, not gigabytes; Ethernet was 10Mbps, not 1,000Mbps; and the GNU compilers were not as good at optimization as commercial compilers. Therefore, people wrote assembler for the tightest, fastest parts of the system.
If those programs were written today, however, they might have a lot less assembly language, and the code would be more portable. Thus, the contest was expanded to include improving the performance of these modules and (perhaps) eliminating some of the old assembly language where it made sense.
Embedded systems exemplify how our perspective of "performance" has changed over time, in that the size of the memory footprint is often a measure of performance, with a small footprint providing savings in the manufacturing process. Extended battery life, achieved by allowing parts of the system to be turned off after the application is finished, also represents an improvement in performance. In large server farms, performance is often measured in electricity savings, savings on cooling, or in reduction of equipment purchases and floor space.
In the early years of my programming career, my job was not to write new functionality but to get other people's programs to work "better." My manager told me that if I could not get the application to work in half the time, not to bother with it. In almost every case, I could make an application run not only in half the time, but often five to 10 times faster. It was a very satisfying job, so it has been interesting to start investigating new techniques for profiling code, finding the bottlenecks, and seeing new performance improvements and efficiencies that can be made since I did this work 30 years ago.
At the same time, I am working with some very small systems that have some really interesting features. The use of GPUs for computation, digital signal processing chips, and field-programmable gate arrays (FPGAs) were all conceptual years ago, but they were cost and space prohibitive. These concepts now have become not only feasible but even competitive in price/performance with other, more "mainstream" types of circuitry.
A board from a company called Adapteva not only has a SoC with a two-core ARM processor, FPGA, and digital signal processing chips, it also has a 16- or 64-core CPU. All of this, plus some system memory and USB ports, comes on a board in the US$ 100-150 price range . The opportunity to learn about these architectures has now become practical.
Recently some people attracted a lot of attention by building a "supercomputer" out of a Raspberry Pi, a single-core system that does not invite the type of programming that might occur in a real HPC system. In an HPC system, each board can have several CPUs or several cores in a single CPU and use OpenMP in conjunction with MPI and other heterogeneous computing environments. Substituting computers such as the Banana Pi  or ODROID-U3  would create a higher performing "supercomputer" at a reasonable increase in price and would afford a more realistic mix of programming styles.
I encourage readers to sign up for Linaro's contest and help GNU/Linux be the best that it can be.
Buy this article as PDF
The company is collaborating with Google and Intel to use Kubernetes as an engine for Fuel
Customers can take a free test drive of SLES for HPC on the Azure Cloud
San Francisco-based chip company announces their first fully open source chip platform.
The whole distro gets rebuilt on glibc 2.3
Ubuntu Vendor tries to solve app packaging and distribution problem across distributions.
Founder of ownCloud launches the Nextcloud project.
Will The Machine change the way future programmers think about memory?
The new Torus distributed storage system is available under an open source license on GitHub
Juries decides Google’s use of Java APIs Was Fair Use