Facebook releases its own OOM implementation

Facebook Not Amused

Compared to the apparent erratic way the badness() function worked until 2010, the current OOM implementation is easy to understand and sensibly designed. But not all companies that use Linux on a large scale see it that way.

One very prominent critic of the current OOM implementation is Facebook: The company is known to rely on Linux in its data centers, albeit in a heavily modified form.

The Facebook developers summarized their problems with the kernel's own OOM killer on the net, stating that the OOM killer reacted unpredictably. The possibilities for influencing its function were too limited, and the configuration options inadequate. And because the OOM killer resided in the kernel, it was also extremely sluggish: There can be a pretty long wait between the time when the kernel realizes that there may be a memory problem, and the time when it actually does something about it and releases memory (Figure 4).

Figure 4: Maximum time saving: The OOM killer holds a system in a stranglehold for almost six minutes in an OOM scenario; OOMd only needs about three minutes.

The kernel kills the processes one after the other, re-evaluates the situation, and has to empty the page cache in the meantime before looking at the next process, so the cycle is continually restarted until the system is running stably once again.

In the worst case, according to Facebook, livelock situations lasting more than 30 minutes can occur. As a reminder: A deadlock is a situation in which all components of a system are waiting for another component on the same system. Livelocks work in the same way, but the relations between the waiting components change at regular intervals. While the kernel could currently be waiting for the remains of a program to disappear from the page cache, it might be waiting for the OOM score to be calculated for a certain process a short while later.

Facebook's annoyance with this situation is understandable: During the 30 minutes that such a scenario allegedly takes, the system is practically unusable. Only a hard reboot is an effective measure, but that is exactly what Facebook wants to prevent.

OOM as a Holistic Measure

OOMd [1], which Facebook presented in August 2018, cuts off old braids and certainly takes a very bold approach to the subject of OOM. The biggest difference with the existing OOM implementation in Linux is probably the fact that Facebook's OOM killer does not run in kernel space, but as a normal application in user space. This makes OOMd a real novelty, as OOM has undisputedly been part of the kernel domain until now.

OOMd doesn't have big technical disadvantages – ultimately, if it is running with sys admin privileges, it can kill a process simply by issuing a SIGKILL. The OOM killer in the kernel doesn't do anything wildly different at the end of the day.

In return, the advantages of the userspace implementation are obvious: In particular, OOMd is far more flexible than the kernel implementation could ever be.

Adding PSI

If you want to try out OOMd right after reading this article, be warned: You still need the kernel. The functions do not reside in the kernel in OOMd's case. But OOMd in user space depends on receiving as much information as possible about the current state of the system from the kernel. A suitable interface already exists in the form of Linux-PSI [2], a component that reports on kernel memory, CPU, and IO pressure metrics. Linux PSI is not yet in the kernel. If you want to use OOMd, you have to build PSI into your kernel, a task that is made easier because PSI is available as a kernel module.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Polyakov's OOM Killer Tamer

    Evgeniy Polyakov has released a patch to the kernel's out-of-memory (OOM) killer function, which was designed to prevent a system freeze in an OOM condition by sacrificing one or more processes. The patch "tames" the function by defining the specific process to kill.

  • Kernel News

    Improving the Android low memory killer; randomizing the Kernel stack; and best practices.

  • Kernel News

    Zack Brown reports on container-aware cgroups, a different type of RAM chip on a single system, new SARA security framework, and improving GPIO interrupt handling.

  • Kernel News

    Chronicler Zack Brown reports on the NOVA filesystem, making system calls userspace only, and extending module support to plain executables. 

  • Security Lessons

    When a test kernel starts wrecking network cards, the community gets busy.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News