The ARM architecture – yesterday, today, and tomorrow

The Future: 64 Bits

History tends to repeat itself: In the 1990s, the makers of large Unix servers ran up against the 4GB limit that a 32-bit machine can address directly. As a result, 64-bit architectures, such as Ultra Sparc Sun were developed. Shortly after the turn of the millennium, x86 machines faced the same problem, which led to the development of x86-64.

When AMD launched x86-64, this was a real extension of the existing 32-bit architecture, which in turn was based on a 16-bit architecture – the older versions are all subsets of the newer ones. Although this approach enables easy recycling of existing knowledge and code, it rules out profound architectural changes.

ARM systems are now on the same cusp: For one thing, small devices are being equipped with more and more memory, which attracts applications that benefit from it. On the other, the ARM architecture is becoming increasingly attractive for servers. For both servers and small devices, the limit of 4GB is increasingly becoming an obstacle.

Newer ARMv7 cores, such as the Cortex A15, only partially work around the problem: The Large Physical Address Extension (LPAE) allows up to 1TB to be addressed physically (in a style similar to PAE in x86), but this does not affect the limitation to 4GB of virtual memory per thread. The radical solution is to switch to a 64-bit architecture.

In 2011, ARMv8 defined two architectures – Aarch64 and Aarch32 – which, in turn, use the A64 and A32 instruction sets (as well as Thumb 2 as T32 for Aarch32). Aarch32 and A32 are downwardly compatible with ARMv7 (but not vice versa), whereas Aarch64 has a new instruction set by the name of A64. Thus, in contrast to x86, there is no need to rely on existing gaps in the old instruction set or craft new commands with complicated prefix structures – you can build a clean instruction set.

For example, all A64 commands can be encoded in just 32 bits of code, even though the number of registers has doubled: X0 through X29, as well as X30 as a link register and X31 as a hard-wired zero register. The program counter, which occurs in the A32 instruction set as R15, is now a special register that can only be accessed via customized commands. The commands themselves are based on their counterparts from A32 in essence, but the following architectural changes have made some adjustments necessary.

The biggest of these adjustments relates to the processor modes and the dependent mode-specific registers: With the eight modes of the later versions of A32, only a small part of the many registers is used. Additionally, dealing with the various exception modes is awkward. In A64, ARM introduces a highly simplified model, featuring four exception levels (Figure  5), that has similarities with the rings in x86 architecture.

Figure 5: ARMv8 processor modes and their compatibility with 32-bit software. The exception levels are strongly reminiscent of the rings in x86.

The lowest level – EL0 – is intended for use by applications (user-mode), whereas EL1 is equivalent to the old system mode and is used to execute privileged parts of an operating system. EL2 is assigned to hypervisors, and EL3 is part of ARM's Trust Zone security concept, that is, for the operation of security monitors (see the box "Encrypted ARM processors"). Each of these modes has only three private registers: a link register for exceptions, the stack pointer, and the stored status register.

Encrypted ARM Processors

In addition to complex, virtualization-based security concepts such as Trust Zone (Figure 5), for some years, high-performance ARM processor versions have supported fully encrypted booting. In practical terms, the processor draws on OTP (one-time programmable) processor registers, which manage a key that encrypts or signs the bootloader. This approach ensures that the program you are running really does come from the developers or manufacturers.

Freescale [7] offers processors with these features under the Vybrid Imx28 brand. The LPC3143 from NXP [8] has been on the market even longer. The Picosafe [9] open source project offers a complete tool chain that supports a fully encrypted boot process from the bootloader to the root file system. (Benedikt Sauter)

Conditional execution of all instructions no longer exists in A64. Although this allows for very elegant code, it greatly complicates the implementation of an out-of-order machine. This is also why load-store multiple instructions have had to give way to simpler instructions that only support loading or saving two registers.

Minor changes are the result of practical experience with the 32-bit architecture. There are two virtual address spaces, each 248 bytes (256TB) – one for the applications starting at address 0, and one for the operating system kernel from address 264 downward. Virtual addresses are mapped to physical addresses with four-level (4KB pages) or three-level (64KB pages) page tables, depending on page size. Addressing itself continues to follow the load-store approach, but the addressing modes have been adjusted to simplify computations.

As in x86-64, where 32-bit applications run on a 64-bit system, ARMv8 also provides for compatibility. In the event of exceptions and on leaving them, the processor can switch between Aarch64 and Aarch32 and reciprocally map the registers, depending on the mode; 32-bit access only addresses the lower half of the register (Figure 5). Thus, it is possible to run Aarch32 applications on an Aarch64 operating system and run guests with both architecture versions side by side on an Aarch64 hypervisor [6].

Linux and ARMv8

Although ARMv8 was first introduced in 2011 and the first IP cores are already available in the form of the Cortex A53 and Cortex A57, no implementations existed beyond simulators and FPGA-based prototypes until 2013. In September, Apple introduced an ARMv8 chip with its new iPhone 5S and the Apple A7 SoC. Other manufacturers have announced products that will probably be launched later this year or early next year.

Unfazed by the lack of hardware, developers are already working very actively on software for the new architecture: the code for Aarch64 can easily be generated with current versions of GCC, and the Linux kernel supports the architecture as of version 3.7. When the first 64-bit ARM machines hit the markets, Linux will be ready for them.

Infos

  1. ARM licensees: http://www.arm.com/products/processors/licensees.php
  2. Samsung's Exynos 5 Dual: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/products5dual.html
  3. Qualcomms Krait cores: http://www.qualcomm.com/snapdragon/processors#CPU
  4. Furber, Steve. ARM System-on-Chip Architecture, 2nd edition. Pearson, 2000
  5. Robin Randhawa's whitepaper on Big-Little: http://arm.com/files/downloads/System_Software_for_b.L_Systems_Randhawa.pdf
  6. "ARM Goes 64-bit" by David Kanter: http://www.realworldtech.com/arm64/
  7. Vybrid and Imx28 by Freescale: http://www.freescale.com
  8. LPC3143 by NXP: http://www.nxp.com
  9. Picosafe: http://www.picosafe.de

The Author

Jan Richling is a visiting professor for operating systems and embedded systems at the Technical University of Berlin (TU Berlin). Anselm Busse is a research assistant and PhD student in the field of communication and operating systems at the TU Berlin. Their research focuses in part on increasing the energy efficiency of many-core systems.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Canonical Ports Ubuntu on ARM Platform

    End of last week ARM Ltd and Canonical Ltd announced that they would port Linux to the ARMv7 processor architecture. If all goes well, the two collaborating firms should provide further hardware manufacturers with the basis to develop new, energy-efficient mobile devices, especially for the popular netbooks and so-called hybrid computers.

  • Xeon 7300: Intel's First Quadruple-Core Processor Platform

    The Xeon 7300er processor family is Intel's first quad-core processor for multiple processor servers. The energy efficiency of the new processors differs depending on the speed with 2.93 GHz requiring 130 Watts compared to 50 Watts for a 1.86 GHz version.

  • Chromebooks on Linux

    Chromebooks are firmly locked in the Google jail, but with the right know-how, you can break out of vendor lock-in and operate the devices with free software.

  • Programming the Cell

    The Cell architecPIture is finding its way into a vast range of computer systems – from huge supercomputers to inauspicious Playstation game consoles. We'll show you around the Cell and take a look at a sample Cell application.

  • Hildon vs. QT: Developers at Ubuntu Mobile consider Change to KDE

    Since November 2008 it's been popular knowledge that Ubuntu has been working on a version of its operating system for the mobile ARMv7 architecture, especially for Cortex A8/A9 processors. Now it seems that Ubuntu mobile might drop Gnome Mobile Hildon framework in favor of Qt.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News