Zack's Kernel News

Unlocking Memory Access

Thomas Gleixner recently posted a patch to provide a preemptible version of kmap_atomic() and similar interfaces. The kernel can't interrupt atomic functions to do things like give CPU time to other processes, but it can interrupt preemptible functions without problems. Preemption is the entire idea behind multitasking. The more preemptible the kernel can become, the smoother and faster the user experience can be, even when there are tons of things running all at once.

The kmap_atomic() function is used to access memory that is likely to be remapped soon. You access it, hang on tight while you need it, and only return control to the larger system when you've gotten what you needed. We're generally talking thousandths of a second or faster. But still, more speed is good.

Like the infamous big kernel lock (BKL), kmap_atomic() will not go away all at once. There are various implementations, which Thomas's patch seeks to unite under a single umbrella that everyone would use. Then, for cases that can handle the preemptible versions, Thomas will split off separate implementations that avoid atomic locking. In this way, Thomas hopes to whittle away kmap_atomic() the same way the BKL was whittled away years ago.

Thomas explained, "This is not a wholesale conversion which makes kmap_atomic magically preemptible because there might be usage sites which rely on the implicit preempt disable. So this needs to be done on a case by case basis and the call sites converted to kmap_temporary." And he also added, "this is only lightly tested on X86 and completely untested on all other architectures." In other words, here's a flame thrower; go play.

Linus Torvalds was more than happy with this. Even aside from atomicity and preemptibility, he liked what Thomas was doing here. He even said the whole kit and kaboodle might be a good replacement for kmap() itself, instead of only kmap_atomic().

Instead of Thomas's intended use, Linus suggested that "another solution might be to just use this new preemptible 'local' kmap(), and remove the old global one entirely. Yes, the old global one caches the page table mapping and that sounds really efficient and nice. But it's actually horribly horribly bad, because it means that we need to use locking for them. Your new 'temporary' implementation seems to be fundamentally better locking-wise, and only need preemption disabling as locking (and is equally fast for the non-highmem case)."

Thomas, along with Matthew Wilcox, had some concerns with Linus's idea. As Matthew put it, "people might use kmap() and then pass the address to a different task. So we need to audit the current users of kmap() and convert any that do that into using vmap() instead." To which Linus replied, "Ahh. Yes, I guess they might do that. It sounds strange, but not entirely crazy – I could imagine some 'PIO thread' that does IO to a page that has been set up by somebody else using kmap(). Or similar."

Thomas also replied to Linus's high hopes for a full-on kmap() replacement, saying, "I thought about it, but then I figured that kmap pointers can be handed to other contexts from the thread which sets up the mapping because it's 'permanent'. I'm not sure whether that actually happens, so we'd need to audit all kmap() users to be sure. If there is no such use case, then we surely can get of rid of kmap() completely. It's only 300+ instances to stare at and quite some of them are wrapped into other functions."

They went back and forth a bit on the implementation. Though at one point Peter Zijlstra pointed out to Thomas that (as Thomas then explained to Linus), "If a task is migrated to a different CPU then the mapping address will change which will explode in colourful ways." And Linus replied, "Heh. Right you are. Maybe we really *could* call this new kmap functionality something like 'kmap_percpu()' (or maybe 'local' is good enough)."

The discussion continued, with various participants. The clearest thing to come out of it all was that the project had good reason to get much more ambitious, and that the resulting problems would be much more difficult to solve. The whole thing became a deep and dark implementation discussion with many interweaving tendrils, the ultimate goal of which was to eek out that extra little bit of high-speed multiprocessing from every running Linux system.

It's not at all clear where this will all go. At the very least, Thomas's final patch will be something better than what was there before. But ultimately, it's not clear that any of the participants in the discussion are fully clear on what they're trying to do and how it will finally work. This is not unusual in kernel development. Often objections, solutions, and perspective-shifts come from unexpected places in the middle of heated debate – not unlike a merry-go-round whirling at top speed, with developers clinging onto the various bobbing horses, trying to pull each other inward towards the center.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on string handling routines and speeding up database workloads.

  • Kernel News

    Chronicler Zack Brown reports on printk() wrangling, persistent memory as a generalized resource, making Kernel headers available on running systems, and Kernel licensing Hell. 

  • Kernel: New Maintainer for x86 Branch

    Back at the Kernel Summit in September Andi Kleen announced that he would no longer be maintaining the i386 and x86_64 branches if they were merged in the new x86 branch. A new patch shows that Kleen has kept his promise.

  • Linus Releases 2.6.33-rc1

    After releasing a new Kernel version, Linus Torvalds needed a few days of rest to put some remaining patches into the next release. The so-called merge window has closed, with the 2.6.33 branch now open.

comments powered by Disqus