Zack's Kernel News

Zack's Kernel News

Article from Issue 188/2016

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Random Number Generation on Modern Systems

Stephan Müller recently pointed out that /dev/random has been showing signs of age relative to modern environments like embedded systems, solid-state drives, massively parallel systems, and virtualized systems. The problem is how to identify good sources of entropy on all systems, so that /dev/random really does produce random numbers that are equally random across all environments.

Stephan's approach, LRNG (Linux Random Number Generator), seeks to solve that problem and especially to provide proper entropy sources during boot time. He also wanted LRNG to have a lower performance effect on parallel systems and allow accelerated cryptographic primitives. Crypto primitives are simple, reliable tools that are used as building blocks of larger scale security systems. Massively parallel systems have to implement security protocols on all nodes, and having good cryptographic speed can benefit that.

Stephan gave a link to a scholarly article he'd written that described his approach [1]. Beyond the technical details, Stephan chose to release his design under a dual license – either the GPL (version number unspecified) or a more BSD-ish license that allowed closed-source binary distribution.

In terms of implementation, Stephan explained, "The patches do not replace or even alter the legacy /dev/random implementation but allows the user to enable the LRNG at compile time. If it is enabled, the legacy /dev/random implementation is not compiled. On the other hand, if the LRNG support is disabled, the legacy /dev/random code is compiled unchanged. With this approach you see that the LRNG is API and ABI compatible with the legacy implementation."

Nikos Mavrogiannopoulos read the PDF and noticed that in both the traditional /dev/random implementation and Stephan's LRNG implementation, the random number generator would be "minimally" seeded 112^6 bits of entropy. Nikos said, "Unfortunately one of the issues of the /dev/urandom interface is the fact that it may start providing random numbers even before the seeding is complete." And based on the text, Nikos concluded that LRNG suffered from this problem as well. He said, "That's a serious limitation [...], since most/all newly deployed systems from 'cloud' images generate keys using /dev/urandom (for sshd for example) on boot, and it is unknown to these applications whether they operate with uninitialized seed."

Nikos liked the rest of Stephan's implementation, but he felt that if /dev/random was going to be replaced, any new implementation should ensure "that the kernel seed buffer is fully seeded prior to switching to userspace."

Stephan reassured Nikos that the getrandom() system call would block until the appropriate amount of seed data had been obtained; after which, he said getrandom() would behave like /dev/urandom. Alternatively, he said, "you may use the /proc/sys/kernel/random/drbg_minimally_seeded or drbg_fully_seeded booleans. If you poll on those, you will obtain the indication whether the secondary DRBG feeding /dev/random is seeded with 112 bits (drbg_minimally_seeded) or 256 bits (drbg_fully_seeded). Those two booleans are exported for exactly that purpose: allow user space to know about initial seeding status of the LRNG."

Nikos pointed out that user code would need to have some way to tell whether getrandom() existed on a given system. "Today," he said, "due to libc not having the call, we can only use /dev/urandom and applications would most likely continue to do so long time after getrandom() is introduced to libc." Stephan explained:

Implement the syscall yourself with syscall(). If you get ENOSYS back, revert to your old logic of seeding from /dev/urandom.

If you know you are on kernels >= 3.14, you could use the following steps in your library:

1) poll /proc/sys/kernel/random/entropy_avail in spaces of, say, one second and block your seeding process until that value becomes non-zero 2) if you unblock, seed from /dev/urandom and you have the guarantee of having a /dev/urandom seeded with 128 bits.

Nikos didn't like that explanation at all. He replied, "That's far from a solution and I wouldn't recommend to anyone doing that. We cannot expect each and every program to do glibc's job. The purpose of a system call like getrandom is to simplify the complex use of /dev/urandom and eliminate it, not to make code handling randomness in applications even worse."

Theodore Ts'o replied, "Yes, but if glibc is falling down on the job and refusing to export the system call (I think for political reasons; it's a Linux-only interface, so Hurd wouldn't have it), then the only solution is to either use syscall directly (it's not hard for getrandom, since we're not using 64-bit arguments which gets tricky for some architectures), or as Peter Anvin has suggested, maybe kernel developers will have to start releasing the liblinux library, and then teaching application authors to add -linux to their linker lines."

But Nikos felt that the "political" issue was significant. If the system call wasn't available on a given system, "they have an almost impossible task to simulate getrandom() on kernels which do not support it. One may agree with their concerns, but the end result is that we have not available that system call at all, several years after it is there."

Ted rejoined, "The whole *point* of creating the getrandom(2) system call is that it can't be simulated/emulated in userspace. If it can be, then there's no reason why the system call should exist." He suggested a range of technical implementation possibilities. Or, he said, "you can let the application author specify some kind of 'I want to run in insecure mode', via some magic glibc setting. You could probably default this to 'true' without a huge net reduction of security, because most application authors weren't getting this right anyway."

Elsewhere, Ted had his own objections to Stephan's code. He pointed out that some of the entropy sources might not contain true entropy and would therefore lead to insecure random number generation. Stephan replied that any individual source of entropy, such as the "jitter" source Ted mentioned, could be removed to satisfy Ted's concerns.

Sandy Harris spoke out in favor of the jitter source, saying, "Jitter, havege and my maxwell(8) all claim to get entropy from variations in timing of simple calculations, and the docs for all three give arguments that there really is some entropy there." He gave a link to a PDF discussing the issue [2].

Pavel Machek also had some objections. He noticed that Stephan's code seemed to be dependent on the hardware having a high-resolution clock on board. He asked, "What goes on if high resolution timer is not available?" Stephan replied, "If there is no high-resolution timer, the LRNG will not produce good entropic random numbers." He listed the 14 architectures for which the Linux kernel did not implement a high-resolution timer and pointed out that none of those were large-scale architectures. He said, "Please note that also the legacy /dev/random will have hard time to obtain entropy for these environments. The majority of the entropy comes from high-resolution time stamps. If you do not have them and you rely on Jiffies, an attacker has the ability to predict the events mixed into the pools with a high accuracy. Please remember the outcry when MIPS was identified to have no get_cycles about two or three years back."

But Stephan added, "the patch I offer leaves the legacy /dev/random in peace for those architectures to not touch the status quo." Pavel replied, "… that's the major problem – right? Makes it tricky to tell what changed, and we had two RNGs to maintain." Stephan said:

I would rather think that even the legacy /dev/random should not return any values in those environments. The random numbers that are returned on these systems are bogus, considering that the only noise source that could deliver some entropy excluding timestamps (if you trust the user) are the HID event values. And for those listed systems, I doubt very much that they are used in a desktop environment where you have a console.

If everybody agrees, I can surely add some logic to make the LRNG working on those systems. But those additions cannot be subjected to a thorough entropy analysis. Yet I feel that this is wrong.

My goal with the LRNG is to provide a new design using proven techniques that is forward looking. I am aware that the design does not work in circumstances where the high-res timer is not present. But do we have to settle on the least common denominator knowing that this one will not really work to begin with?

By the end of the discussion, most of the objections to Stephan's code seemed on track to finding decent solutions or workarounds. This is one of those situations where an older implementation of a kernel feature just isn't cutting it anymore because the industry has moved in directions that hadn't been predicted (massively parallel systems, etc.), and so the code needs to be updated to do the best it can to support what exists in the world. Because of that, even if Stephan's code ends up having missing pieces and other remaining problems, it's likely to still go into the kernel in one form or another, just as an improvement over what was there before. After that, future patches would continue to address the remaining problems where possible.

Randomizing Memory Locations to Secure Against Attack

Thomas Garnier implemented ASLR (Address Space Layout Randomization) for kernel memory on x86-64 systems. ASLR is used to prevent attackers from writing security exploits based on a known location of code in memory. A weak form of ASLR has existed in the Linux kernel since 2005 and has been supplemented by various patch sets for use in security-oriented Linux distributions ever since. Thomas wanted to bring proper ASLR to the main tree itself. Thomas explained, "This security feature mitigates exploits relying on predictable kernel addresses. These addresses can be used to disclose the kernel modules' base addresses or corrupt specific structures to elevate privileges." He went on, "Knowing the base address and physical memory size, an attacker can deduce the PDE virtual address for the vDSO memory page. This attack was demonstrated at CanSecWest 2016, in the 'Getting Physical Extreme Abuse of Intel Based Paged Systems' [3] (see second part of the presentation). Similar research was done at Google leading to this patch proposal. Variants exists to overwrite /proc or /sys objects' ACLs leading to elevation of privileges."

To implement his solution, he explained, "Entropy is generated using the KASLR early boot functions now shared in the lib directory (originally written by Kees Cook). Randomization is done on PGD & PUD page table levels to increase possible addresses. The physical memory mapping code was adapted to support PUD level virtual addresses. An additional low memory page is used to ensure each CPU can start with a PGD aligned virtual address (for realmode)."

There was no significant debate on the mailing list. H. Peter Anvin and others had some minor technical issues and bugs to report against Thomas's patch, but no one expressed any doubts about adding the feature itself.

Security has always been a central element of Linux development, but it has never received the amount of testing it's gotten in recent years. In the old days, the biggest threats were from spammers wanting to set up their own botnets or individual hackers looking for thrills, and for many years the more tempting target of such attacks would be Windows machines. Nowadays the United States, China, Russia, and many other countries devote significant resources to cyber warfare, and Linux presents a very tempting target because it is essentially the back end for every significant service on the Internet. The Linux developers are having to shore up security features that were not necessarily tended very carefully for many years.

Eventually, the tremendous focus on world-wide cyber warfare will result in a much stronger and more secure Linux kernel in all respects. For now, the developers are having to play catch-up. Ultimately, the pace of kernel development will always leave it susceptible to new vectors of attack, but hopefully within a few years most existing attack vectors will be nailed down.

Tracking Removable Devices

Wade Mealing posted some patches to implement a new "audit subsystem," that would log when devices were added to or removed from a running system. Along with the subsystem, he included a set of user tools to sift through the audit logs and track specific devices or events. For his initial implementation, he included support for USB devices only, although he hoped to extend that to other subsystems as well.

Oliver Neukum felt that the project might not be worth it and should at least be publicly debated before anything serious was implemented. In terms of specific implementation, he suggested that Wade stick to generic functions rather than being quite so USB specific.

Bjørn Mork agreed that the project needed a public debate. Specifically, he pointed out that there had already been earlier discussions, with different conclusions from Wade's proposal. He said:

Greg has already asked the obvious questions and made the obvious 'do this in userspace using the existing uevents' proposal. I did not see any followup to his last message, so I assumed this audit thing would return to the drawing board with a userspace implementation [4].

It was quite surprising to instead see a USB specific kernel implementation duplicating existing device add/remove functionality. Why? The provided reason makes absolutely no sense at all. Userspace tools are as intelligent as you make them. And 'decoded, filtered or ignored' implies policy, which IMHO has no place in the kernel in any case.

Bjørn concluded, "I think the generic layer implementation is already there. The proposed USB specific solution adds nothing, as pointed out by Greg the last time this was discussed."

Greg Kroah-Hartman joined in the chorus of implementing Wade's patches in userspace and to catch all device types rather than just USB.

Steve Grubb, however, spoke out partially in favor of a kernel-based implementation, saying, "The audit system has to do everything possible to make sure that an event is captured and logged. Does the uevent netlink protocol ever drop events because the user space queue is full? If the uevent interface drops events, then it's not audit quality in terms of doing everything possible to prevent the loss of a record. If this were to happen, how would userspace find out when a uevent gets dropped? I may have to panic the machine if that happens depending on the configured policy. So, we need to know when it happens. If on the other hand it doesn't ever drop events, then it might be usable."

Paul Moore supported Steve's statements, saying:

Audit has some odd requirements placed on it by some of its users. I think most notable in this particular case is the need to take specific actions, including panicking the system, when audit records can't be sent to userspace and are 'lost'. Granted, it's an odd requirement, definitely not the norm/default configuration, but supporting weird stuff like this has allowed Linux to be used on some pretty interesting systems that wouldn't have been possible otherwise. Looking quickly at some of the kobject/uevent code, it doesn't appear that the uevent/netlink channel has this capability.

It also just noticed that it looks like userspace can send fake uevent messages; I haven't looked at it closely enough yet, but that may be a concern for users which restrict/subdivide root using a LSM … although it is possible that the LSM policy could help here. I'm thinking aloud a bit right now, but for SELinux the netlink controls aren't very granular and sysfs can be tricky so I can't say for certain about blocking fake events from user space using LSMs/SELinux.

Greg said he'd never seen uevent drop an event in 10 years of watching. He asked (several separate times, as it turned out) what the use case for Wade's code really was. Wade replied:

The goal of these message is to let a system administrator see in the audit logs, that a device has been plugged in and the basic details about this. Having this only in user space means that (and Greg alludes to this) that this will be for human eyes only and not be machine usable in the kernels. Without it being in kernel, it can't be extended for manipulation by auditctl at some point in the future.

Specifically I am trying to create a well formed audit trail when devices are added or removed from the system by the user space audit tools. The implementation at the moment does not do any filtering, but rather creates the raw audit events.

In some ways this is similar to a decorated class in say java. In this case the class is unaware it is being decorated yet we can monitor what is happening in that class without polluting the class code with messy log or trace information.

I don't see either kernel or user-space applications create add or remove events in the audit subsystem. I understand that some events are placed into uevents (To be intercepted by udevd), while this also exports the same information it is not in the audit subsystem in kernel.

Burn Alting also offered his own list of abilities that he hoped would be provided by Wade's code:

– when was a (possible) removable media device plugged into a system and what were the device details – perhaps my corporation has a policy on what devices are 'official' and hence one looks for alternatives, and/or,

– was it there at boot? (in case someone adds and removes such devices when powered off), and eventually

– has an open for write (or other system calls) occurred on designated removable media? (i.e. what may have been written to removable media – cooked or raw) – Yes, this infers a baseline of what's connected or an efficient means of working out if a device is 'removable' at system call time.

In essence, I need to know if and how removable media is being used on my systems. The definition of 'removable' is challenging, but my idea would be for one to be able to define it via the auditd interface.

Clearly, Burn's purpose would be to implement security features. In a later post he acknowledged that a determined hacker could get past these audits, but that it was important to thwart the less skilled hackers.

Greg didn't respond directly to these feature desires, but later he seemed to back off from his opposition to them, saying, "It's not an easy problem, good luck all!"

That seems to be far from the last word, however. The feature seems shrouded in controversy, not least of which will inevitably be: If the kernel can't stop a determined hacker, then don't such features just amount to code clutter? For now, it seems that the immediate objections have been withdrawn, and a kernel-based audit trail is still in the works.


  1. "Linux Random Number Generator – A new approach" by Stephan Müller:
  2. "The maxwell(8) random number generator" by Sandy Harris:
  3. "Getting physical: Extreme abuse of Intel based paging systems" by Nicolas A. Economou and Enrique E. Nissim,
  4. Extending usb to do device auditing:

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95