Linux containers with systemd-nspawn and rkt

Container Time

Article from Issue 184/2016
Author(s):

The systemd project has given rise to lots of other interesting tools and technologies. Meet systemd-nspawn, a container tool that serves as a simple Docker alternative.

Systemd-nspawn [1]is a lightweight container tool that can run a command or full operating system in a contained environment on Linux. According to the systemd-nspawn man page, systemd-nspawn is "…similar to chroot(1) but more powerful since it fully virtualizes the filesystem hierarchy as well as the process tree, the various IPC subsystems, and the host and domain names." (See also the "Other Container Tools" box.)

Other Container Tools

To understand systemd-nspawn, it can be helpful to contrast it with a few different but related tools.

Chroot [3] is one of the oldest and simplest ways to provide some process isolation on Linux. The chroot system call allows the calling process to switch to an isolated filesystem environment. After that, any filesystem path reference that the application makes is considered relative to the chroot directory. An example of this behavior is:

chroot /home/editorial/images/jessie/ /bin/ls

The second part of the line attempts to run the ls command in the chroot environment set up by the first part (Figure 1). The new root directory thus resides on the host below /home/editorial/images/jessie. After using the chroot command, the process on the host does not see any files outside of /home/editorial/images/jessie.

Fundamentally, all chroot does is change the mechanism that's used to resolve pathnames when a process tries to access the filesystem. Chroot thus provides a basic level of isolation at the filesystem level. Unfortunately, the simple isolation provided by chroot is quite trivially breakable: Various methods exist for "escaping" a chroot jail (e.g., if a process is already holding onto a file descriptor pointing outside of the chroot before the call is made), so chroot alone does not provide sufficient security. Chroot also doesn't offer any of the other types of process isolation that can be desirable on Linux, like memory usage or network interfaces.

The past five years have seen the emergence of more powerful containment tools, like systemd-nspawn, rkt [4], and Docker [5], that take advantage of Linux kernel features to provide much greater isolation between processes on a system.

Rkt and Docker are both targeted at end users and admins wanting to run applications in containers. Systemd-nspawn is a lower-level tool, targeted more at developers and testers.

Rkt is an application container runtime developed at CoreOS, and it is an implementation of the App Container Specification (appc) [6]. When running application containers, rkt internally uses a staged architecture. The first stage, stage0, is the rkt command line itself, which is responsible for things like discovering application container images on the Internet or from repositories, downloading them across the network, and managing a local disk cache. Stage1 is responsible for setting up the actual isolated environment, using the necessary kernel features to isolate the applications from the host. Finally, stage2 refers to the user-specified applications themselves; in the case of rkt, multiple applications can run in a single pod.

The Rkt version delivered with Core OS directly leverages systemd-nspawn to do all of the heavy lifting when it comes to setting up the container. Another version of rkt uses the kvm tool [7], which sets up a lightweight Virtual Machine (VM) that takes advantage of the hardware isolation provided with the Linux kernel's KVM driver.

Docker is a container platform that consists of a lot of parts, with duties ranging from executing individual containers in a host, to scheduling and orchestrating containers across large clusters of servers. For the purposes of this comparison, Docker consists of two key modes encapsulated in the docker command-line tool:

  • daemon mode, which performs all of the heavy lifting involved in running and managing containers
  • client mode, which is how most users interact with the Docker engine [8]. For example, a simple docker run command is translated into an API call that is passed on to the local Docker engine, which is then responsible for setting and running the container that the user specified.

The Docker engine is responsible for a huge number of different functions: from retrieving container images over the Internet, to managing the lifecycle of containers on a system, to serving the aforementioned REST HTTP API (whether to the actual "docker" client, or any other HTTP client). The Docker engine is thus necessarily long-running (because it directly manages the lifecycle of all "Docker containers" on a system).

Figure 1: A chroot environment offers a simple, but insecure, form of isolation.

The systemd-nspawn container tool began as a means for systemd developers [2] to test building and running systemd itself without affecting the host operating system. Systemd-nspawn lets you launch an application in an isolated container with a single command, making it quite handy for developers who want to run buggy pre-release code without risking damage to the system.

Since the first release, systemd-nspawn has evolved to include a swath of functionality, ranging from advanced networking configurations to SELinux integration and native overlay filesystem support. Modern systemd-nspawn is a versatile and full-featured tool you can use for a variety of different Linux use cases, but its primary purpose is to serve as a tool for developing and testing.

Namespaces and Cgroups

Internally, systemd-nspawn uses several features of the Linux kernel to provide process and resource isolation. The first and foremost of these features is namespaces [9].

Linux namespaces isolate various system resources in a way that is abstracted from processes. For example, if a process is in its own unique PID (process ID) namespace, it will not see any other processes on the system that aren't in that same namespace. In this way, users can restrict processes from interacting with each other along various different axes. The Linux kernel provides a number of different namespaces (Table 1).

Table 1

Kernel Namespaces

Namespace

Function

IPC

System V IPC, POSIX message queues

Network

Network devices, stacks, ports, etc.

Mount

Mount points

PID

Process IDs

User

User and group IDs

UTS

Hostname and NIS domain name

A process generates a namespace by issuing the system call unshare(). This call detaches the calling process from its existing namespace and creates a new namespace. A process can also use the setns() system call to change to an existing namespace on the system.

Systemd-nspawn's extensive use of namespaces is reflected in its name. "Nspawn" refers to the fact that the tool generates new namespaces. By default systemd-nspawn will run processes in their own IPC, mount, PID, and UTS namespaces. You can also give the container an independent network namespace and a flag to enable rudimentary user namespace support. For more information on namespaces, refer to the excellent series of introductory articles on LWN [10].

Another key container technology for Linux is cgroups [11]. (When people use the term "Linux containers," they're typically referring to a combination of cgroups and namespaces.) The name cgroups is an abbreviation for "control groups." Cgroups are a means for organizing processes on a Linux system into a hierarchical tree, and then optionally applying different resource parameters to sections of the hierarchy. For example, you can use cgroups to apply memory limits to a particular process or group of processes, and these limits are then enforced by the kernel.

Now, systemd-nspawn itself doesn't do a whole lot with cgroups; it just makes sure the cgroup tree is available within the mount namespace it sets up.

Getting Started with systemd-nspawn

Systemd-nspawn is provided out of the box on any modern Linux distribution that uses systemd as its init system (which these days is almost all of them). In its most basic invocation, you can point systemd-nspawn at a directory and tell it to execute a binary in that directory, but systemd-nspawn also provides over 30 command-line flags to customize different aspects of the containers it creates.

Recent versions of systemd-nspawn also introduced a configuration file, which you can use to encode most of the settings that are available through the flags in a reusable format.

The simple example in Listing 1 shows systemd-nspawn in action. The example downloads an image of the Debian Jessie [12] distribution and then launches it in a container (Figure 2). You need to run these steps with root privileges.

Listing 1

Retrieving and Starting Jessie

 

Figure 2: Systemd-nspawn lets you launch Jessie as a container in a few simple steps.

Additionally, you need to delete the root password in the /home/redaktion/jessie/etc/passwd file to use Jessie. The process looks very similar to this with rkt by the way (Listing 2).

Listing 2

Rkt in Action

 

The commands shown in Listing 2 download an ACI of Etcd version 2.0.0 and launch it (Figure 3). In this scenario, Rkt has set up the required file system in the directory – including a copy of systemd, which it calls using systemd-nspawn [...].

Figure 3: Rkt also retrieves and launches containers with a single command, relying on systemd-nspawn under the hood.

Conclusion

Systemd-nspawn is very much production ready. Many Linux users – on CoreOS and other platforms – are actively using both rkt and systemd-nspawn directly in production and seeing great success.

Having said that, the systemd developers are still careful about how they position systemd-nspawn. For example, the manpage states that systemd-nspawn is not suitable for secure container setups and explains that the intended use is more for debugging and testing.

Although systemd-nspawn is quite fully featured, it still needs some work. One of the areas that could use some improvement is user namespaces [13], which are not very usable in their current form.

With mature and configurable tools like rkt, Docker, and systemd-nspawn, developers and systems administrators have plenty of options for running application containers.

All of the projects described in this article are completely open source and have active, vibrant communities. Anyone interested in helping to define and implement the future of containers on Linux is encouraged to get involved!

The Author

Jonathan Boulle works at CoreOS on all things distributed and all things contained. He's contributed heavily to etcd and fleet and has led development work for the App Container (appc) specification and rkt, the first appc runtime. He also contributes code to the Kubernetes project. Prior to CoreOS, he worked at Twitter on their cluster management platform based on Mesos and Aurora. He's passionate about Linux, F/OSS, the Oxford comma, and developing well-defined systems that scale.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • systemd-networkd

    The new networkd component of the systemd project supports basic network configuration. Despite its early stage of development, one thing is clear: This is a daemon with brains.

  • Professor Knopper's Lab – Removing systemd

    The systemd service manager has been widely adopted by many Linux distros, so why would you want to remove it? The professor reveals why and how.

  • Podman

    Podman gives users a quick and easy way to set up a Nextcloud instance for home use.

  • Docker

    Docker is an economical alternative to conventional virtualization. Because each Docker container shares the underlying operating system, it enjoys the resource isolation and allocation benefits of VMs but is much more portable and efficient.

  • Packages in systemd

    You might need to tweak your Debian or Ubuntu packages to get them to work with systemd.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News