Bringing Up Clouds

Core Technology

Article from Issue 204/2017

Author(s): Valentine Sinitsyn

VM instances in the cloud are different beasts, even if they start off as a single image. Discover how they get their configuration in this month's Core Technologies.

First as a buzzword and then as a commodity, the cloud lives the typical life of an IT industry phenomenon. This means that running something (but usually Linux) in a cloud is a thing you now do more often than not. From a user perspective, it's simple: You click a button on the cloud provider's dashboard and get your virtual machine (VM) running within a minute.

This is drastically different from what you do on your desktop. Here, you insert the DVD or plug in a USB pen drive and spawn the installer. Be it an old-school, text-based or a slick GUI installer, it typically asks you some questions (Figure 1). Which locale do you want to use? What's your computer's hostname? What's your time zone? How do you want your user account named? Which password do you want to use? You may not even notice these questions, because installation takes a quarter of an hour or more, and you spend most of this time sipping coffee or chatting with friends. Yet these questions are essential for the system's operation. Without a password, you won't be able to log in. Or, even worse, everyone will be able to.

Figure 1: The Anaconda installer makes you answer some questions before you can install anything.

A VM in a cloud starts in seconds because it's not really installed. Clouds host prebuilt hard disk images that are effectively cloned when you need a new VM. Although it's faster, it also means there is no "configure" stage, when you can adjust settings to your liking. Cloud-based VMs need some other mechanism to make these changes on the fly so that many instances can start off of a single image. One of these mechanisms is cloud-init [1].

cloud-init in a Nutshell

Initially an Ubuntu project, cloud-init becomes "the standard for customising cloud instances," as the homepage says (Figure 2). The initial idea was to make Ubuntu easier to consume in Amazon Elastic Compute Cloud (EC2) instances. Today, cloud-init integrates with many popular Linux varieties (and even FreeBSD) and runs across different clouds, from Microsoft Azure to your company's private OpenStack deployment. Needless to say, cloud-init is free (GPLv3), and you can get it through your package manager, as well as in Python sources.

Figure 2: Cloud-init is no longer an Ubuntu-only thing, but the well-known palette readily reveals its origins.

As an end user, you don't think about cloud-init too much: It "just works." However, imagine that you want to build your own custom OS image for the cloud. Following the "least surprise" principle for your end user, you'd want to integrate cloud-init and make sure it is able to find the relevant settings and apply them.

You might think of cloud-init as an init [2] process replacement because of the name, but that's not true. Linux already has many init implementations, and it makes little sense to write another one just because your system now runs in the cloud. Instead, cloud-init is designed to interoperate with an existing init, which calls it at well-known points during system startup.

Like it or not, in most Linux flavors today init is equal to systemd, so I'll look at cloud-init from this perspective. Otherwise, the stages are almost the same, but implementation varies per init, as you might expect.

Up and Running

It all starts with the generator (see the "What on Earth Is a Generator?" box), which decides whether cloud-init needs to run at all. To disable cloud-init, you either create an /etc/cloud/cloud-init.disabled file or pass cloud-init=disabled as a kernel command-line option. The KERNEL_CMDLINE="cloud-init=disabled" environment variable takes precedence over the latter, which is useful if you run cloud-init in a container rather than a VM. So, you can temporarily disable cloud-init to run an image outside the cloud in your local virt-manager, for instance. This step is the only systemd-specific one, because no other init system has a notion of a generator.

What on Earth Is a Generator?

A generator is a systemd concept. It's a small binary whose name ends with -generator. Generators run very early in the system startup and were designed to convert non-systemd configuration files into native units (hence the name). They should write unit files or create symbolic links, introducing new dependencies between them. This makes generators a viable means to customize the systemd boot process.

There are different places where a generator can reside in your system, each with its own priority (consult the systemd.generator(7) man page [5] for details). For instance, on my Ubuntu 16.04 LTS system, cloud-init-generator resides in /lib/systemd/system-generators/. It's a small shell script spanning a bit more than a hundred lines of code. This script performs the checks I discussed in the main text, and if cloud-init is not disabled and the data source is found, it creates a symlink to cloud-init.target in the appropriate multi-user-target.wants directory.

If cloud-init is not disabled, the generator injects cloud-init.target into the multi-user.target. This target uses WantedBy to pull several units (.service files) that ultimately call /usr/bin/cloud-init, passing it different command-line options. Note that this executable is not a daemon: It does what was requested and then exits, and you won't find it running in your cloud instance. The wonderful cloud-init documentation [3] covers the boot process in a greater detail, but here is a quick summary for your convenience.

First, there is cloud-init-local.service, which runs as soon as the root filesystem becomes writable and before the network is configured. It translates to /usr/bin/cloud-init init --local, and the main idea is to render the required network settings on the first boot.

Second, cloud-init.service already has access to the network; that is, it can use non-local data sources (I'll revisit this later). This stage runs /usr/bin/cloud-init init to finalize the initialization process. It can configure SSH or provision CA certificates, for instance.

Third, cloud-init modules come into play with /usr/bin/cloud-init modules. This happens in two stages (or substages): the "config" stage (cloud-config.service) and the "final" stage (cloud-final.service). The config stage modules may install packages or configure NTP and the time zone. At the final stage, you upgrade packages and run configuration management agents, such as Salt Minions [4]. This is a proper place for your custom setup. Think of it as an rc.local equivalent in a traditional init.

Modules Galore

The problem with computers is that they do what you say, not what you mean. Before cloud-init can apply any configuration, it must be told exactly what you want to apply. Internally, cloud-init stores all settings in a large dict(), and while it's not a Perl hash, there is more than one way to fill it.

For starters, cloud-init comes with a bare minimum of defaults. They enable data sources for the most popular cloud providers and do some other tweaks. The main configuration file resides in /etc/cloud/cloud.cfg by default, and it's an appropriate place for settings shared between all instances. For instance, you can configure your company's package repository or add an internal CA certificate here. Additional config files may come from /run/cloud/cloud.cfg (this is known as a "run-time config") or in fact any location you supply via a CLOUD_CFG environment variable. Again, the latter is mostly useful in containers.

Many data sources would allow you to supply configuration bits via some type of data: metadata, user data, or vendor data. This is the way to go for all instance-specific settings, but I will leave the details until the next section.

This may sound weird, but you can also configure cloud-init via the kernel command line! The format is:

cc: config bits end cc

Cloud-init translates a \n into a newline, so you can pass it a multiline text. Imagine you have spawned an instance and realized you've chosen the wrong SSH key. The warning shown in Figure 3 may look difficult, but if you can get to the VM's GRUB prompt, things are easy to fix. Just append

cc: ssh_import_id: gh:johndoe end_cc

Figure 3: Supplying a valid key pair is essential in clouds: If you do it wrong, you have a hard time logging into your instance.

to the kernel command line, and it will make cloud-init import your (actually, John Doe's) public key from GitHub. That's exactly what the ssh_import_id module is for.

/etc/cloud/cloud.cfg is actually a YAML file. This holds true for most other configuration means, such as the kernel command line above. Moreover, cloud-init supports putting configuration snippets in /etc/cloud/cloud.d to make it easier to extend configuration from the scripts. Those snippets must also be valid YAML, and they are merged together in lexical order as if they were forming a single file [6].

Now, I'll take a quick look at what's typically in /etc/cloud/cloud.cfg, which depends on the distribution and who built the image, so I opted for a generic CentOS 7 OpenStack image [7]. The config file is about 60 lines long, and you can find a heavily trimmed version in Listing 1.

Listing 1

/etc/cloud/cloud.cfg (Abbreviated)

First, you see some settings for the cloud-init modules. They prescribe creating a default user, disabling root, and SSHing password authentication, as recommended for networking setups. Three modules really take care of this: users-groups, ssh, and set-passwords, respectively. The first two come through cloud_init_modules, which lists modules to run at the init stage. The last one is from cloud_config_modules, so it runs at the config stage.

If you have a background in YAML, you may think options such as disable_root should be nested under - ssh, and you are probably right. Remember, however, that the same configuration bits may come through the kernel command line or other means. They have a different structure, and putting these settings at the top level makes handling module code more uniform.

Next is the system_info section. You may think of it as a read-only piece of information regarding the system itself and the distribution. You specify the latter via the distro key, which is important because different distributions provide different tools for common tasks, like applying network configuration. This is not to mention that modules such as yum_add_repo are naturally per distribution. The aforementioned ssh_import_id is also a good example. It runs on Ubuntu and Debian only, perhaps because the ssh-import-id command itself [8] is Ubuntu-specific.

The default_user key is another typical resident under system_config. It specifies the default user parameters that end up in /etc/passwd. The code in Listing 1 shows that you will be able to log in as centos, using the SSH key that you provided, thanks to the ssh module.

1 2 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
New Pentesting Distribution to Compete with Kali Linux

Linux , open source , Tools , Ubuntu

SnoopGod is now available for your testing needs
Juno Computers Launches Another Linux Laptop

Hardware , laptop , Linux , Ubuntu

If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.

Bringing Up Clouds