A Rasp Pi HAT for clustering Pi Zeros

More than Zero

© Lead Image © lightwise, 123RF.com

© Lead Image © lightwise, 123RF.com

Article from Issue 205/2017
Author(s):

Inexpensive, small, portable, low-power clusters are fantastic for many HPC applications. One of the coolest small clusters is the ClusterHAT for Raspberry Pi.

When I started in high-performance computing (HPC), the systems were huge, hulking beasts that were shared by everyone. The advent of clusters allowed the construction of larger systems accessible to more users. I always wanted my own cluster, but with limited funds, that was difficult. I could build small clusters from old, used systems, but the large cases took up a great deal of room. The advent of small systems, especially single-board computers (SBCs), allowed the construction of small, low-power, inexpensive, but very scalable systems.

Arguably, the monarch of the SBC movement is the Raspberry Pi [1]. It is now the third best selling computer of all time [2], overtaking the Commodore 64 and behind the PC and the Mac, and has sparked a whole industry around small, inexpensive, low-power but expandable computers that can be used for anything from sensors in the field, to desktops, to retro game consoles, and even to experiments on the International Space Station. The top-end Raspberry Pi, the Raspberry Pi 3 (RPi3), is about $35, and the introduction of the Raspberry Pi Zero (Pi Zero) in 2015, set the low-end price of $5.

People have been building clusters from Raspberry Pi units, starting with the original Raspberry Pi Model A, ranging from two to more than 250 nodes [3]. That early 32-bit system had a single core running at 700MHz with 256MB of memory. You can build a cluster of five RPi3 nodes [4] with 20 cores connected by a Gigabit Ethernet switch for about $300, including a case and case fan.

Fairly recently, a company created a Hardware Attached on Top (HAT) [5] add-on board that you can add to a single "host" RPi2 or 3. It has four micro-USB slots that each connect to a Pi Zero. It provides both power and a network between the Pi Zeros and the host node. The ClusterHAT [6] fits into the GPIO pins on the host (master node) and accepts up to four Pi Zeros in mini-USB ports.

The ClusterHAT kit is a little over $25 and includes the HAT board, four standoffs, a handy USB cable, and some plastic feet if you want to put them on the bottom of your host node (Figure 1). Putting everything together is very easy, and you can watch a video [7] on the ClusterHAT website to show how it's done.

Figure 1: ClusterHAT components.

After threading the USB cable between the HAT and the RPi3, attaching the HAT, and snapping the Pi Zero boards into the HAT, you should have something that looks like the Figure 2. Next, attach a keyboard, a mouse, an external power supply, and a monitor (Figure 3). Notice that the Pi Zeros are powered on in this image (i.e., the lights near the boards are lit).

Figure 2: ClusterHAT and Pi Zeros attached to a RPi3.
Figure 3: Completed ClusterHAT configuration.

The RPi2 or 3 needs a good power supply capable of 2 to 2.5A and at least one microSD card for the master node. You can put a card in each Pi Zero, if you want, or you can NFS boot each one (although that's a little experimental). I put a 16GB microSD card into each of the five Raspberry Pis.

The ClusterHAT site has created Raspbian Jessie-based [8] software images that have been configured with a few simple tools for the ClusterHAT. Jessie is a little different from past Raspbian versions, and the biggest difference that affects the ClusterHAT is the use of DHCP by default.

For this article, the target cluster configuration uses an RPi3 as the master node and Pi Zeros in the ClusterHAT as compute nodes. The master node will be an NFS server, with /home and /usr/local, for the four Pi Zeros. Additionally, the cluster will use passwordless SSH and pdsh [9], a high-performance, parallel remote shell utility. MPI and GFortran will be installed for building MPI applications and testing.

At this point, the ClusterHAT should be assembled and the operating system (OS) images copied to the microSD cards for the five nodes. The next steps are to boot the images for the first time on the RPi3 and configure them to meet the previously mentioned target configuration.

Master Node

Because the master node effectively controls the cluster, getting the OS configuration correct is important. Fortunately, only a small number of changes need to be made to the image provided on the website.

The first step is to boot the master node (RPi3) with its microSD card. Be sure it is plugged in to your local network and can access the Internet. After the RPi3 boots, you should be in the Pixel desktop [10] (Figure 4). A few "classic" configurations are called for at this point with the help of the raspi-config [11] command:

Figure 4: Pixel desktop.
  • Expand the storage to use the entire microSD card
  • Change the default password for the pi account
  • Enable SSH (it is no longer enabled by default in Raspbian)
  • Switch the keyboard to a US keyboard (by default, Raspbian uses a UK keyboard)

After configuring, you need to install the NFS server packages on the RPi3 (master node):

$ apt install nfs-common nfs-kernel-server

Next, the NFS exports, a list of filesystems to be exported, needs to be created. The file /etc/exports should be created or edited to include the following:

/home        *(rw,sync,no_subtree_check)
/usr/local   *(rw,sync,no_subtree_check)

Notice that the filesystems are exported globally with the "*" wildcard. Although this is usually not a good idea, because anyone could mount the filesystems, I almost always run the system with no Internet access, so I'm not too worried. To make things safer, you can specify an IP range that can mount the filesystems.

To ensure that the NFS server starts when the RPi3 is booted, run the following commands:

$ sudo update-rc.d rpcbind enable
$ sudo /etc/init.d/rpcbind start
$ sudo /etc/init.d/nfs-kernel-server restart

Note that these commands need to be run whenever the master node is rebooted. The filesystems can be exported and checked to make sure they are actually exported:

$ sudo exportfs -ra
$ sudo export's
/home
/usr/local

Next, SSH is configured so that passwordless logins can be used on the cluster. Many tutorials on the web explain how to accomplish this. After SSH, both GFortran, the GCC Fortran compiler, and MPICH, a high-performance implementation of the Message Passing Interface (MPI) standard, are installed using apt:

$ apt install gfortran mpich

The HPC world uses tools that allow commands to run across the entire cluster or a subset of nodes in the cluster. My preferred "parallel shell" tool is pdsh. Consult one of my previous articles [12] for directions on how to build, install, and use pdsh. I installed pdsh for the ClusterHAT in /usr/local. A /home/pi/PDSH directory was created with a /home/pi/PDSH/hosts file that lists the default nodes to be addressed by pdsh when no nodes are specified. For the ClusterHAT, the list of nodes is:

p1.local
p2.local
p3.local
p4.local

As an option, controller.local could be added to the list if the master node is to be a default target for commands.

Configuring Compute Nodes

The ClusterHAT site provides several ways to build images for the compute nodes. I took the easier route and downloaded five images – one for the controller (head node or master node), and one each for the four compute nodes – and copied them to a microSD card for each node. By taking this approach, each image could be booted in the RPi3 so that changes could be made before booting the entire cluster.

Unlike the master node, the compute nodes do not boot into the desktop; they just boot to the command line so you can log in to the node.

Booting each image allows you to make some basic changes that are needed on the first boot of a Raspbian system. First, the command raspi-config must be used, as discussed for the master node, to extend the filesystem to use the entire microSD card and enable SSH (see the bulleted list above). The third action, changing the password, should be done on all the compute node images; they should have the same password, but not the default, raspberry.

To make life easier, I like to have my clusters share a common /home for the users and /usr/local/ for applications shared by the nodes. The ClusterHAT cluster is no exception: I want to mount /home and /usr/local from the master node to all of the compute nodes. I added the following lines to /etc/fstab on all of the compute node images (Listing 1).

Listing 1

/etc/fstab Additions

 

Also installed on each compute node are gfortran and mpich; these were installed on the master node from the Raspbian repositories, so they are not installed in /usr/local or /home; consequently, they have to be installed on each node.

By default, the Pi Zero nodes are named p1.local, p2.local, p3.local, and p4.local. If you look at the ClusterHAT from above, at one end you can see the labels p1, p2, p3, and p4, for the four slots. The master node has the node name controller.local.

After setting up the master node, the ClusterHAT, and the compute nodes, it's time for the first boot!

First Boot

Booting is pretty simple: You plug in the HDMI cable to the monitor and then plug in the power cable to boot the RPi3 master node, which should go into the Pixel desktop. The first time I booted the ClusterHAT, I didn't have it plugged in to the network because my router acts as a DHCP server and assigns IPs to the compute nodes. This setup can sometimes cause problems.

Once the master node has booted, it is a good idea to check the node to see if it looks correct – look especially to see that the two filesystems are NFS exported and that gfortran, mpich, and pdsh are functioning.

The ClusterHAT images come with a very useful tool to start and stop the compute nodes. The clusterhat tool [13] is a simple Bash script that uses gpio [14] (General Purpose Input/Output) pin commands to control the power to the compute nodes, allowing you to turn nodes on and off individually, in groups, or all together and adding a two-second delay between the command for each node. For example, to turn on all of the compute nodes, you run:

pi@controller:~ $ clusterhat on all
Turning on P1
Turning on P2
Turning on P3
Turning on P4
pi@controller:~ $

People have also taken the spirit of the clusterhat tool and created something a little different. For example, clusterctl [15] allows you turn the compute nodes on and off, but also lets you determine the status of the node, cut the power, and even run a command across all of the compute nodes.

The first time, it's probably a good idea to boot the cluster nodes one at a time. For example, to boot the first node, run:

pi@controller:~ $ clusterhat on p1
Turning on P1
pi@controller:~ $

Booting nodes one at a time allows each to be checked to make sure everything is installed and has booted properly.

Remember that the master node NFS-exports two filesystems to the compute nodes. Given that the Pi Zeros use a bridged network [16] over USB 2.0, the network performance is not expected to be very good. Therefore, it will take a little longer for the filesystems to mount. One suggestion is to ping the node (ping p1.local) until it responds. If the filesystems don't mount for some reason, you can use the clusterhat tool to turn the node off and then on again.

After testing each node independently and ensuring that everything works correctly, you can then reboot all of the nodes at once. Now you can test the cluster by running some MPI code on it.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • HPC Cluster Basics

    The beginning for high-performance computing is understanding what you are trying to achieve, the assumptions you make to get there, and the resulting boundaries and limitations imposed on you and your HPC system.

  • Proxmox VE

    The Proxmox Virtual Environment has developed from an insider’s tip to a free VMware ESXi/ vSphere clone. We show you how to get started setting up a PVE high-availability cluster.

  • PelicanHPC

    Crunch big numbers with your very own high-performance computing cluster.

  • Samba for Clusters

    Samba Version 3.3 and the CTDB lock manager provide full cluster support.

  • LAM/MPI

    The venerable LAM/MPI infrastructure is a stable and practical platform for building high-performance applications.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News