Zack's Kernel News

Panic on OOM Timeout

Michal Hocko posted some patches implementing a "panic on OOM timout" feature. In other words, when the system detected an out-of-memory condition, it would start a timer. If the OOM killer happened to kill the correct process and free up enough memory, then the system would continue running. But, if it couldn't find the right process in the time allotted, there would be a panic, producing some hopefully usable debugging information, and the system could be rebooted in an orderly fashion. Without the timer, Michal argued, the OOM killer could just go on killing the wrong processes, leaving the system unusable for an unpredictable amount of time. The timer, he said, added an important element of predictability to the situation.

Tetsuo Handa, who had worked on a similar feature some months earlier, agreed with the feature in principle but had questions about the implementation. The two immediately launched into a technical comparison of their two patch sets, discussing specific scenarios that could lead to the timeout taking too long and other undesirable end results.

Part of the complexity of the debate arose from the fact that in an out-of-memory situation, the system is already ailing, so the question becomes identifying which ailments are preferable to others, when a given code path is trying too hard to solve a problem that won't make enough of a difference anyway, and whether a given code path gives up and shuts down the system while there is still hope of resurrecting it.

The two went back and forth for a while, each essentially defending their own implementations while also submitting additional patches that might either win the other over or address the other's concerns. Ultimately, their approaches grew closer together, but there was no true resolution by the end of the discussion.

The whole question of how to handle out-of-memory conditions is very thorny. It's possible that a technically superior approach might be rejected by Linus Torvalds or someone else along the way just because of the maintenance burden it would create. To some extent, the proper behavior might depend on the most likely user, which can also be hard to identify.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Zack Brown reports on container-aware cgroups, a different type of RAM chip on a single system, new SARA security framework, and improving GPIO interrupt handling.

  • Kernel News

    Zack Brown reports on: Line Ending Issues; Hardware Hinting; and Simplifying the Command Line.

  • Kernel News

    Zack Brown discusses implementing digital rights management in-kernel, improving lighting controls, and updating printk().

  • Kernel News

    This month in Kernel News: Dealing with Older GCC Versions; and On-boarding New Kernel Hackers.

  • KSplice

    Uptime is often just as important as updates. But doesn't a kernel patch require a reboot? Ksplice lets you have your cake and eat it too.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95