The Big Picture

Interview – The Xen Project

Article from Issue 235/2020

Lars Kurth of the Xen Project talks about trends, markets, and the project's various threads of development.

There's nothing like someone's passing to get one to reflect on the fickle nature of life. The Xen Project's [1] very lively chairperson, Lars Kurth, sadly passed away not long after our interview at the Open Source Summit in November 2019. Lars had been with the project for almost a decade and was instrumental in several pivotal moments in Xen's history, including its move to the Linux Foundation. He conceptualized and executed several key decisions and supervised the significant architectural changes that helped the project go beyond the realm of server virtualization and cloud computing. In his last interview, Lars talked about the project's various threads of development and how Xen is all set to disrupt the auto industry.

Linux Magazine: What's happening with the Xen Project?

Lars Kurth: The big picture, ultimately, right now, is [that] there's a number of different trends happening. And I'm trying to kind of condense this into kind of [a] coherent strategy where different stakeholders in the community benefit from it. First, we have the server virtualization and the cloud market segment. Second, we have a whole segment of users, such as Qubes OS [2], and then there's similar products, which are used by the US military called SecureView. So that [SecureView] is based on open embedded, Xen, and extra bits and pieces of OpenXT [3], and they also [are] kind of rethinking the way how the platform is working and also rethinking the approach to Xen. So basically the idea there would be to take Xen as it is right now and just reduce it to the absolutely core minimum, basically as a separation hypervisor, but use the same code base, [and] Kconfig it down to something really small.

In the embedded and automotive space, we're seeing quite a lot of traction as well. So, ARM has just announced their safety software reference architecture at ARM TechCon, and that is basically going to be a reference stack of open source components. But they're also working with proprietary vendors at various levels of the stack and at the bottom is going to be an RTOS and a hypervisor, and they basically chose Xen for that. There's going to be significant investment in this area. This means we have to deal with things like safety certification, and how to do this in an open source project. That's the last juicy community problem in open source.

LM: Are you talking about security?

LK: Well, security is easy. Safety certification is different. Let me take a step back. You see companies like Tesla and stuff today; they have self driving car functionality, autopilot, and stuff. Google does the same. They don't have safety certification for this. When you certify your software to certain safety standards, you suddenly have the capability to basically delegate risk to insurances and stuff like that. If you don't have that – like Tesla – if something goes wrong, they're going to have to pay for everything. So if you don't have a lot of users, and it's still quite experimental, or you want to disrupt the market and have deep pockets, you kind of can go down that route; a whole industry cannot go down that route.

So if you look at the traditional open source software development, well, you get about your code, you test your code, and there's basically a strong collaborative model around it. But it's totally different from how you would approach developing system software [with] safety in mind, because there you start with requirements, and then you break it down into designs, and so on and so forth. Then you reassemble; then you write all your tests, which prove all your claims; then you have a third party, which looks at all your paperwork and all the code and says "well, those guys are saying that they're doing this and are doing this;" and then you get the stamp of approval. This is not easy to implement in open source. But you can do it for a sufficiently small piece of code.

LM: Does this open you up to larger markets?

LK: If we manage to do this (and I'm more than 50 percent confident now that we can do this) and … do this using Xen to enable downstreams or vendors to then do this themselves in a cost efficient manner, then basically we have a huge new market open, where today open source isn't relevant.

Jim [Zemlin] kind of keeps on bringing this slide in his keynote about which market segments Linux and open source [have] penetrated. You notice, embedded is sort of 45 percent, and everyone else is like above 80 or near to 100, because the 55 percent are safety software. That can only be cracked from a community perspective, if we can adapt our development model to make it compatible with that world. So that's kind of the wide context; these are big themes, and it will take multiple years to follow.

LM: Moving on, what's happening on the cloud server virtualization side?

LK: On the cloud server virtualization side, I think we and everyone else [are] basically chasing hardware vulnerabilities in side-channel attacks. It is forcing us to rethink and rewrite some of the core components of the hypervisor, because we can't trust the hardware anymore. So either we could just go into this mode where we're just going to plug one hole after the next, which is not very efficient, or we can just take a step back and think if we can re-engineer this in some ways that should be generally more resilient towards these kinds of attacks. This is what is happening now.

LM: Any concrete steps you can point out?

LK: One of the things we as a project just did is SUSE rewrote the scheduling framework. So we now have core scheduling in the next Xen release. It's not on its own sufficient to switch hyperthreading back on again, like you remember [in] MDS and L1TF [4]. We have the core-scheduling part in place, and then we either need what we call the secret free Hypervisor or the more synchronized scheduling.

This is how it becomes interesting. We've seen an increase in participation from Amazon. We had our development event in June; they sent nine people. This was the biggest contingent of people from any vendor. I mean, Citrix had eight there. And then they just started presenting some of the ideas publicly, and they started to pick up things which we have to do but which were too slow. One of the things they picked up and have already sent patches for is a group of functionality which we call a secret-free Hypervisor [5].

LM: What is a secret-free Hypervisor?

LK: So the basic idea there is … [that] we restructure the way how data is stored, how the memory is laid out, and so on, such as if there is a side-channel attack, you can't get access to anything useful. The first step is to remove the direct map. So that was something that Citrix started picking up. Then now Amazon is running this, and they're able to do this a lot quicker, because they just hired an awful lot of people to work on this. If you look at the top contributors to the project, the traditional sort of split used to be Citrix around 40 percent, SUSE around 20 percent, and then there would be hardware vendors, which were usually in the sort of 7 percent bracket. From what I've seen, I think we will see Amazon to be in our top three, probably displacing SUSE.

LM: Isn't that worrying? Doesn't Amazon have a tendency to fork?

LK: I think they [Amazon] have finally won the battle internally to say "actually we now can probably work with open source." All the people I've worked with are really professional. Also, the other thing is [that] they have started hiring the leadership from the community to continue doing this. So obviously there is a little bit of a worry there, because they are still figuring out their level of commitment [and] haven't proven to me yet that they are trustworthy. But they are going in the right direction, and that's a good thing. Looking at the KVM Forum schedule, I think they seem to be doing the same there. I don't know what this means in the long run, but it does look as if they are gearing up to follow a multitechnology strategy.

LM: Once the Xen Project gets that security certification, will the ball be in your court?

LK: They're [Amazon] really helping on all the security stuff. The most interesting piece [that] they announced – I mean, this is really, really complicated stuff; we had lots of discussions around this at the Developer Summit – is live updating [6]. So basically, the way they're designing this is [that] they can just update the Xen version without rebooting. I mean, if they can pull this off, wow! But you know, this is still in the planning stage, and this is not implemented yet. We are seeing some discussions around this now. And it's kind of really cool. Knowing them, I don't think they would be suggesting this if they couldn't pull it off.

LM: What are you excited about in the next release?

LK: The core piece we're seeing there in Xen 4.13 [7] is the piece around core scheduling. There is actually a talk at 11:30 [at Open Source Summit] by Dario [Faggioli] [8]. He's covering KVM as well, but the KVM folks are quite a bit behind on this. So you can see … lots of security-related stuff, not so much focused on features with the exception for there's been quite a lot of focus on AMD hardware and stuff like that. Intel's also been doing interesting stuff in the area. For the side-channel attacks, we typically need to update your microcode, and normally you do this when you reboot, right? We've implemented functionality in Xen, which allows you to do this at run time, so you don't have to reboot. This whole theme around security is around two things. One is just trying to get ahead with some of the side channel – existing and [a] few potentially future side-channel related stuff. I think we also have some mitigations for Spectre-v1 now in this release as well. Then the second piece is around, basically, being able to just keep everything up to date without reboots.

LM: Any noteworthy developments on the hardware-support front?

LK: So there's support for the latest AMD Rome CPUs and stuff. Traditionally what would happen in the past is that when a new vendor would bring out a new CPU … it would actually work on Xen and KVM out of the box. But then we would add features to kind of improve the performance and start to exploit the new CPU features. What we're now starting to see is that actually new hardware breaks some of the old versions of hypervisors. So, that kind of also shows you some of the jumps of innovation which are happening all over the place. So like these AMD Rome CPUs, they don't work on Xen 4.12 or older, because they've just done really clever stuff with the hardware which requires code which wasn't there before. And it won't even run unless the stuff is there.

The embedded segment is really interesting. So, the history there is, like, Xen ARM [9] was originally written for servers. And then we always had this strong tie in with the sort of military space – like there are systems that use Xen-based virtualization for things like radar. And they started experimenting with embedded use cases initially on x86. Then when the ARM Quad came along, they started pushing us a little bit for secret little use cases into the ARM space as well. What has happened in the last three years is as virtualization extensions on ARM CPUs became relevant for some of these embedded use cases, we've seen interest from automotive and other market segments, and this came entirely as a surprise. I mean I was supporting this just as a cool marketable kind of project.

LM: When did you notice that change in gears, so to speak?

LK: Around the beginning of last year, there was a kind of step change. We had a number of fairly big companies like EPAM [Systems]; they are like a consultancy that [has] about 30,000 people working for them. They decided to speculatively – which consultancies don't normally do – build a reference platform for automotive that includes Xen at the bottom, and they have a number of different use cases where they can mix infotainment with cockpit control [10] and some other things. But then you have workloads of different criticality, [which] means you have to have a level of safety certification. But they were kind of just ignoring this at the time and just said, "we're just going to prototype that, and we're going to identify where the gaps are." They've thrown quite a lot of money into this just to get to the stage to … be able to show the demos. And they also started then taking this further. There's a whole kind of piece of this where you have a VM [virtual machine] where you can download container workloads, like, you know, basically microservices from a cloud provider.

LM: What practical purpose would this serve?

LK: The use case there would be, for example, if you have a taxi fleet, you would download a container, which kind of tells you about all your cars. Or, if you were interacting with a smart city, you could drive to the city [and] download something, and then your car can interact with it. The possibilities are endless. Or things like in the UK, in Europe, a lot of insurances offer to install a little black box in a car, and it [the car] sends telemetric data about your driving and that would already be pre-integrated into the platform. They could have all the insurance companies, which they can just tie on, and then they can just channel the workload for this specific use case and enable it in a car, and it stays updated.

So it [has] really lots of interesting potential. And they [EPAM] were the leaders in this, and then ARM, at the end of last year, started to build an automotive product group. I think the conclusion is that there will be, like, significant investment.

At the same time, we set up a working group where we have safety expert companies on it – where we have companies like ARM, EPAM, and then we will have other companies on that as well – which are driving this entire safety story forward. It's kind of a really interesting, interesting project. And it's challenging. But we're also working with the Linux Foundation, the ELISA [Enabling Linux in Safety Applications] project, and Zephyr, which is trying to do something similar as well. We're all facing quite similar issues.

LM: They're all planning for safety certification as well?

LK: Yes. It's a really tough problem, because basically just the way how to safety software that was conceptually designed in the 80s and the way how we do software now is very different. But actually, the more I look into this, the more I think this is actually possible, and most of the issues aren't as huge [11]. Do you remember like 20 years ago when everybody said, you can't do open source in enterprise? And now everybody says you can't do open source for safety. But actually, if we can change some of the tooling – the tooling which is generally used in this field is so antiquated and you have all these different tools, and you have to store documentation in one tool, [and] you store another piece of documentation in another one. They are all kinds of server-based things, and the code is somewhere else. And then you have to have a manual change process, which uses another tool where you keep all your artifacts in sync. You just look at this and break it down and see that actually this can't be that hard with a bit of support and a bit of will.

You basically need to build up enough confidence now, which gets us to the same point as we were in the 90s with Linux, where a couple of big vendors come out and support this, and then we're over the crucial barrier. We're close to this point now. I'm so excited, since it's such an interesting problem. I originally come from the embedded space. I don't know whether you're aware that in the 80s, Daimler had an EU-funded project called PROMETHEUS, and they were developing a self-driving car. The first variant was called Beta One, and I have worked a little bit on something called Beta Two. This was pre-GPS, so it was all image processing based. You could never productize that, because we had like a 500 CPU supercomputer in the boot of the car. But it was an interesting proof of concept, and some of the stuff which is being done now actually comes from some of these early ideas, which is why I'm kind of excited about that.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • The World According to Linus

    The 25th birthday of Linux is an important milestone for everyone in the Linux community. Who better to help us ring in the occasion than the man who started it all: Linux creator Linus Torvalds. 

  • Open Hardware – Brown Dog Gadgets

    Brown Dog Gadgets is making science education more accessible and affordable with open hardware.

  • Interview: Chris DiBona, Google

    Trevan McGee talks to Google's open source and public sector programs manager, Chris DiBona.

  • Book Review – Hackerteen: Internet Blackout
  • Meet Greg Kroah-Hartman

    Greg Kroah-Hartman is one of the friendliest faces of the Linux kernel community. He maintains the stable branch of the Linux kernel and participates in many more Linux projects. He works at the Linux Foundation as a fellow, a job that allows him to dedicate his time to the Linux kernel. I sat down with Greg to talk about the kernel, 25 years of Linux, and what he does with the rest of his life.

comments powered by Disqus