Continguous Memory Allocator in the Linux Kernel
-
Upload
kernel-tlv -
Category
Software
-
view
478 -
download
2
Transcript of Continguous Memory Allocator in the Linux Kernel
Linux CMA (Contiguous Memory Allocator)Mark Veltzer
Why contiguous memory?
DMA
Sharing between operating systems
Why large contiguous memory?
Brain dead devices (200MB in one DMA transfer?).
Devices that uses large memory chunks.
Performance of scatter/gather.
kmalloc (4MB on x86, less on other platforms), gfp (8MB) limitations.
And again, sharing between operating systems...
Solutions to the contiguous memory problem
Use small hardware buffers (the most common).
Use scatter gather in hardware or in software (very common if small buffers are not available especially in general availability products because of the problems of the next methods and the unavailability of an IOMMU).
Reserve a large contiguous buffer (a.k.a the old way try not to use me if you can...).
CMA (you are here! - half way to nirvana).
IOMMU (a.k.a. the best thing since sliced bread pure bliss your company will build a temple in your honor).
Doing it on your own (no CMA)
First reserve a large buffer for your needs and tell the kernel to keep it's filthy grubbing hands off your buffer...
Find an free large area by looking at /proc/iomem. I found 200M free buffer at address 800M in this example.
Pass mem=800M memmap=200M$800M to the kernel at boot (this actually depends on the kernel version, for some kernels only memmap is needed, ugh!!!).
Check it by passing bootmem_debug
See that the allocation worked by looking at:dmesg | grep user (no guarantee, kernel log changes all the time)
cat /proc/iomem (even better than previous).
/sys/firmware/memmap/*/* (the raw data can be machine parsed).
Doing it on your own (no CMA). Continued...
Now you want your driver/kernel code to access this buffer
request_mem_region with the same physical address you passed as a kernel parameter.
ioremap
And just you the buffer you get from ioremap.
Dont forget to iounmap and release_mem_region...
Lets see an example
What's wrong with this approach?
You need to find a free spot from /proc/iomem. This space can change between board to board, chipset to chipset, at hotplug time, pci plug&play non determinacy, between architectures and more.
The space will not be used for anything, ever. So that space is lost even you never do anything with it.
You need to know the size of RAM you want in advance, and be accurate about it. This is because every byte you allocate is a byte you can't use for regular system operations.
CMA
What is it? A piece of code, now in the standard kernel.
Enabled in the default Linux configuration and by most current Linux distributions.
A result of a long line of patches with various approaches intended to solve the large contiguous memory problem.
Well integrated, both with the Linux memory allocator (gfp) and with the Linux DMA subsystem.
How do you use it?
Make sure it's in your kernel (CONFIG_CMA)
You can make the default CMA context larger or smaller via various kernel configuration options (CMA_AREAS, CMA_DEBUG, CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE, ...)
Use the regular DMA API to allocate DMA buffers. That's it.
This means that drivers do not need to be changed.
Allocation time will be a little longer if you ask for large buffers (see next slide). But not by much.
You can use the special CMA APIS (dma_contiguous_reserve, dma_contiguous_early_fixup, dma_alloc_from_contiguous, dma_release_from_contiguous, dma_declare_contiguous) but you don't have too. Try to avoid it.
How does it work?
Pages in the kernel are marked as movable and non movable.
Pages which are movable are pages where the physical addresses don't matter. This is true for most pages (especially user space pages and the page cache which take up most of the space in the system).
When you alloc via cma (actually when the DMA api allocs and CMA is enabled) it migrated pages pages which are migratable (movable) to make physical room for you buffer.
While the migration is taking place the CMA uses a feature of the page allocator which allow to mark a page block as MIGRATE_ISOLATE which prevents the page allocator from allocating these pages while the moving is taking place.
Allocating is from CMA contexts/pool. By default there are 7 (controllable from kernel configuration) of those and one is default. You can create your own and have a private pool for you driver if you are willing to use the special API. Usually you don't need these but for special considerations (realtime etc) this may be an option.
How does this solve my problems?
Well, with CMA all of your RAM is used and if you never ask for that large contiguous space then you can use all of your systems RAM.
In addition CMA is integrated with the page allocator so when you return pages (simply dealloc via the DMA API) the pages actually go back to being used by whomever.
CMA is transparent to drivers while the previous solution is not.
Issues with CMA
get_user_pages() is an issue which is still to be dealt with efficiently. get_user_pages() locks pages and makes the unmovable. This is dealt with today by moving pages away as soon as they are locked which creates a bit of overhead.
IOMMU is better (see next slide).
IOMMU
IOMMU solves all of your large contiguous woes (at least as they pertain to hardware, not to operating system sharing).
It also has a security advantage.
And much better debugging and stability.
Really, try to use a board with IOMMU It's well worth the extra $$$.
Maybe you already have an iommu? Check /sys/class/iommu/*...
References
$KERNEL/doc/kernel-parameters.txt (cma=)
$KERNEL/drivers/base/dma-contiguous.c
$KERNEL/include/linux/dma-contiguous.h
https://lwn.net/Articles/486301/ A deep dive into CMA
https://lwn.net/Articles/474819/ DMA buffer sharing
https://lwn.net/Articles/396657/ Older CMA article some of it irrelevant.
https://lwn.net/Articles/480055/ The Android ION memory allocator (replacement of pmem).