The ARM architecture – yesterday, today, and tomorrow
The Special Bit
Another special feature in the instruction set is the S bit, which serves several purposes. For one, it helps to achieve more granular control of conditional execution. Normally, the processor changes the condition flags for each command; for example, if the result of a computation is 0, it sets the zero flag. In ARM, this only happens if the S bit is set, thus keeping the condition flags' status independent of current computations.
The S bit can also be used for controlling the processor modes and their specific registers. Setting the S bit thus allows access to the user mode registers. If a program with the S bit set writes to the program counter, the processor automatically switches to the previous mode. In combination with a load multiple, a programmer can thus implement a very elegant approach for returning from an interrupt.
Neon, Thumb, Jazelle
The ARM design supports easy extensibility by adding up to 16 coprocessors, which are controlled by special ARM coprocessor instructions, thus supporting, for example, floating-point computations. If no coprocessor responds to a statement of this kind, an exception allows a simple emulation in software. Other notable and commonly used extensions are a memory management unit and a media processing unit by the name of Neon.
Besides the actual ARM instruction set, most ARM processors, support up to three other instruction sets: First, there are now two versions of Thumb mode, allowing a higher code density through the use of 16-bit instructions.
Whereas the first version of Thumb accessed only half of the registers and had to switch back to the ARM instruction set to handle exceptions, its successor, Thumb 2, allows 16- and 32-bit instructions and waives most of the restrictions, thus still allowing performance comparable to ARM mode despite the higher code density. Additionally, the Jazelle instruction set also provides hardware acceleration for Java bytecode; however, ARM has reduced the scope of support in recent versions [4].
Multicore
When the clock speed is increased, energy consumption grows faster than computing speed. Thus, energy efficiency decreases as clock speed increases; this is a problem for mobile devices in particular. The use of multiple cores partly solves the problem because they achieve the same number of computations per unit of time at a lower clock speed, thus improving energy efficiency. Multicore CPUs are thus interesting for ARM architecture.
A variety of solutions come from both ARM itself – for example, the MP core with up to four cores – and by architecture licensees. Two of the biggest challenges in the design of multicore CPUs are cache coherence and interrupt distribution. ARM only offers one IP block for the ARM11 that contains the cores, the cache coherency logic, and interrupt distribution.
Starting with the Cortex A9, ARM has sold these components as separate IP blocks to allow SoC designers more freedom of design. For holders of an architecture license, this freedom is even greater, but the details are generally sparsely documented, or not at all, publicly.
Buy this article as PDF
(incl. VAT)