Edge AI acceleration with Raspberry Pi
Machine Learning Workshop – Raspberry Pi AI Kit

© Lead Image © robsnowstock, 123RF.com
Raspberry Pi enters the artificial intelligence accelerator fray with a low-cost solution.
Public clouds are doubtlessly the most convenient and cost-effective tool for individuals to use to access the latest neural processing unit (NPU) artificial intelligence (AI) accelerating hardware. Due to its high cost and lack of availability, it is hard for a freelance developer to justify acquiring one of the now famous (and infamous) NVIDIA A100 [1] Ampere cards with tensor processing units, with costs running into the tens of thousands of dollars and the cards destined for inevitable obsolescence as the newer Hopper (NVIDIA H100) [2] units arrive (as I am writing, I can see A100s advertised for $1.29/hour by a cloud vendor). The economic choice is easy: Develop the code offline and only then provision a souped-up cloud instance for the actual training or inference task.
Low-cost development strategies comprise Google's Colab [3], a hosted Jupyter Notebook environment providing free access to GPU and TPU (Google's own AI chip) resources for research and learning, or purchasing older accelerators being sold secondhand at heavily discounted prices (with varying degrees of residual usefulness). At the Dragon Propulsion Laboratory, we are partial to another strategy: edge AI accelerators. Designed to enhance devices at the edge of the network, this class of chips is both low in power demands and affordable cost-wise, as well as worth understanding in its own right (Figure 1).
First Look
The Raspberry Pi AI Kit just became available, giving me the opportunity to present a first look at what promises to be a very interesting new entry in the edge accelerator class. The AI Kit consists of the standard Raspberry Pi M.2 HAT+ daughterboard and a Hailo-8L AI accelerator. Connections to the host Pi 5 board run through the GPIO pins and a dedicated ribbon connector, resulting in a single-lane PCIe 3.0 connection with 8GBbps of bandwidth (Figure 2). Rated at 13 Tera Operations per Second (TOPS) of inference performance, the accelerator should offload the Pi's CPU entirely of neural network execution, leaving it free to respond to its results, as hinted by object recognition, segmentation, and tracking demos showcased in the launch announcement [4].
[...]
Buy this article as PDF
(incl. VAT)