Continguous Memory Allocator in the Linux Kernel

Linux CMA (Contiguous Memory Allocator)Mark Veltzer

Why contiguous memory?

DMA

Sharing between operating systems

Why large contiguous memory?

Brain dead devices (200MB in one DMA transfer?).

Devices that uses large memory chunks.

Performance of scatter/gather.

kmalloc (4MB on x86, less on other platforms), gfp (8MB) limitations.

And again, sharing between operating systems...

Solutions to the contiguous memory problem

Use small hardware buffers (the most common).

Use scatter gather in hardware or in software (very common if small buffers are not available especially in general availability products because of the problems of the next methods and the unavailability of an IOMMU).

Reserve a large contiguous buffer (a.k.a the old way try not to use me if you can...).

CMA (you are here! - half way to nirvana).

IOMMU (a.k.a. the best thing since sliced bread pure bliss your company will build a temple in your honor).

Doing it on your own (no CMA)

First reserve a large buffer for your needs and tell the kernel to keep it's filthy grubbing hands off your buffer...

Find an free large area by looking at /proc/iomem. I found 200M free buffer at address 800M in this example.

Pass mem=800M memmap=200M$800M to the kernel at boot (this actually depends on the kernel version, for some kernels only memmap is needed, ugh!!!).

Check it by passing bootmem_debug

See that the allocation worked by looking at:dmesg | grep user (no guarantee, kernel log changes all the time)

cat /proc/iomem (even better than previous).

/sys/firmware/memmap/*/* (the raw data can be machine parsed).

Doing it on your own (no CMA). Continued...

Now you want your driver/kernel code to access this buffer

request_mem_region with the same physical address you passed as a kernel parameter.

ioremap

And just you the buffer you get from ioremap.

Dont forget to iounmap and release_mem_region...

Lets see an example

What's wrong with this approach?

You need to find a free spot from /proc/iomem. This space can change between board to board, chipset to chipset, at hotplug time, pci plug&play non determinacy, between architectures and more.

The space will not be used for anything, ever. So that space is lost even you never do anything with it.

You need to know the size of RAM you want in advance, and be accurate about it. This is because every byte you allocate is a byte you can't use for regular system operations.

CMA

What is it? A piece of code, now in the standard kernel.

Enabled in the default Linux configuration and by most current Linux distributions.

A result of a long line of patches with various approaches intended to solve the large contiguous memory problem.

Well integrated, both with the Linux memory allocator (gfp) and with the Linux DMA subsystem.

How do you use it?

Make sure it's in your kernel (CONFIG_CMA)

You can make the default CMA context larger or smaller via various kernel configuration options (CMA_AREAS, CMA_DEBUG, CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE, ...)

Use the regular DMA API to allocate DMA buffers. That's it.

This means that drivers do not need to be changed.

Allocation time will be a little longer if you ask for large buffers (see next slide). But not by much.

You can use the special CMA APIS (dma_contiguous_reserve, dma_contiguous_early_fixup, dma_alloc_from_contiguous, dma_release_from_contiguous, dma_declare_contiguous) but you don't have too. Try to avoid it.

How does it work?

Pages in the kernel are marked as movable and non movable.

Pages which are movable are pages where the physical addresses don't matter. This is true for most pages (especially user space pages and the page cache which take up most of the space in the system).

When you alloc via cma (actually when the DMA api allocs and CMA is enabled) it migrated pages pages which are migratable (movable) to make physical room for you buffer.

While the migration is taking place the CMA uses a feature of the page allocator which allow to mark a page block as MIGRATE_ISOLATE which prevents the page allocator from allocating these pages while the moving is taking place.

Allocating is from CMA contexts/pool. By default there are 7 (controllable from kernel configuration) of those and one is default. You can create your own and have a private pool for you driver if you are willing to use the special API. Usually you don't need these but for special considerations (realtime etc) this may be an option.

How does this solve my problems?

Well, with CMA all of your RAM is used and if you never ask for that large contiguous space then you can use all of your systems RAM.

In addition CMA is integrated with the page allocator so when you return pages (simply dealloc via the DMA API) the pages actually go back to being used by whomever.

CMA is transparent to drivers while the previous solution is not.

Issues with CMA

get_user_pages() is an issue which is still to be dealt with efficiently. get_user_pages() locks pages and makes the unmovable. This is dealt with today by moving pages away as soon as they are locked which creates a bit of overhead.

IOMMU is better (see next slide).

IOMMU

IOMMU solves all of your large contiguous woes (at least as they pertain to hardware, not to operating system sharing).

It also has a security advantage.

And much better debugging and stability.

Really, try to use a board with IOMMU It's well worth the extra $$$.

Maybe you already have an iommu? Check /sys/class/iommu/*...

References

$KERNEL/doc/kernel-parameters.txt (cma=)

$KERNEL/drivers/base/dma-contiguous.c

$KERNEL/include/linux/dma-contiguous.h

https://lwn.net/Articles/486301/ A deep dive into CMA

https://lwn.net/Articles/474819/ DMA buffer sharing

https://lwn.net/Articles/396657/ Older CMA article some of it irrelevant.

https://lwn.net/Articles/480055/ The Android ION memory allocator (replacement of pmem).

Continguous Memory Allocator in the Linux Kernel

Software

Transcript of Continguous Memory Allocator in the Linux Kernel