Understanding of linux kernel memory model

Understanding ofLinux Kernel Memory Model

SeongJae Park <sj38.park@gmail.com>

Great To Meet You

● SeongJae Park <sj38.park@gmail.com>● Started contribution to Linux kernel just for fun since 2012● Developing Guaranteed Contiguous Memory Allocator

○ Source code is available: https://lwn.net/Articles/634486/

● Maintaining Korean translation of Linux kernel memory barrier document○ The translation has merged into mainline since v4.9-rc1○ https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/ko_KR/memory-barriers.txt?h=v4.9-rc1

https://www.linux.com/sites/lcom/files/styles/rendered_file/public/kernel-dev-2015.jpg?itok=9s3mV2Nc

Programmers in Multi-core Land

● Processor vendors changed their mind to increase number of cores instead of clock speed a decade ago

○ Now, multi-core system is prevalent○ Even octa-core portable bomb in your pocket, maybe?

http://www.gotw.ca/images/CPU.png

GIVE UP :’(

Programmers in Multi-core Land

● Processor vendors changed their mind to increase number of cores instead of clock speed a decade ago

○ Now, multi-core system is prevalent○ Even octa-core portable bomb in your pocket, maybe?

● As a result, the free lunch is over;parallel programming is essential for high performance and scalability

http://www.gotw.ca/images/CPU.png

GIVE UP :’(

Writing Correct Parallel Program is Hard

● Compilers and processors are optimized for Instructions Per Cycle, not programmer perspective goals such as response time or throughput of meaningful (in people’s context) progress

● Nature of parallelism is counter-intuitive○ Time is relative, before and after is ambiguous, even simultaneous available

CPU 0 CPU 1

A = 1;B = 1;A = 2;B = 2;

assert(B == 2 && A == 1)

CPU 1 assertion can be true on most parallel programming environments

● Nature of parallelism is counter-intuitive○ Time is relative, before and after is ambiguous, even simultaneous available

● C language developed with Uni-Processor assumption■ “Et tu, C?”

CPU 0 CPU 1

A = 1;B = 1;A = 2;B = 2;

assert(B == 2 && A == 1)

CPU 1 assertion can be true on most parallel programming environments

TL; DR

● Memory operations can be reordered, merged, or discarded in any way unless it violates memory model defined behavior

○ In short, ``We’re all mad here’’ in parallel land

● Knowing memory model is important to write correct, fast, scalable parallel program

https://ih1.redbubble.net/image.218483193.6460/sticker,220x200-pad,220x200,ffffff.u2.jpg

Reordering for Better IPC[*]

[*] IPC: Instructions Per Cycle

Simple Program Execution Sequence

● Programmer writes program in C-like human readable language

#include <stdio.h>

int main(void){

printf("hello world\n");return 0;

● Programmer writes program in C-like human readable language● Compiler translates human readable code into assembly language

#include <stdio.h>

int main(void){

main:.LFB0:

.cfi_startprocpushq %rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq %rsp, %rbp

Compiler

● Programmer writes program in C-like human readable language● Compiler translates human readable code into assembly language● Assembler generates executable binary from the assembly code

#include <stdio.h>

int main(void){

main:.LFB0:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0200 3e00 0100 0000 3004 4000 0000 0000 ..>.....0.@..... 00000020: 4000 0000 0000 0000 d819 0000 0000 0000 @............... 00000030: 0000 0000 4000 3800 0900 4000

AssemblerCompiler

● Programmer writes program in C-like human readable language● Compiler translates human readable code into assembly language● Assembler generates executable binary from the assembly code● Processor executes instruction sequence in the binary

#include <stdio.h>

int main(void){

main:.LFB0:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0200 3e00 0100 0000 3004 4000 0000 0000 ..>.....0.@..... 00000020: 4000 0000 0000 0000 d819 0000 0000 0000 @............... 00000030: 0000 0000 4000 3800 0900 4000

AssemblerCompiler

● Programmer writes program in C-like human readable language● Compiler translates human readable code into assembly language● Assembler generates executable binary from the assembly code● Processor executes instruction sequence in the binary

○ Execution result is guaranteed to be same with sequential execution;In other words, the execution itself is not guaranteed to be sequential

#include <stdio.h>

int main(void){

main:.LFB0:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0200 3e00 0100 0000 3004 4000 0000 0000 ..>.....0.@..... 00000020: 4000 0000 0000 0000 d819 0000 0000 0000 @............... 00000030: 0000 0000 4000 3800 0900 4000

AssemblerCompiler

Instruction Level Parallelism (ILP)

● Pipelining introduces instruction level parallelism○ Each instruction is splitted up into a sequence of steps;

Each step can be executed in parallel, instructions can be processed concurrently

fetch decode execute

Instruction 1

Instruction 2

Instruction 3

Instruction Level Parallelism (ILP)

● Pipelining introduces instruction level parallelism○ Each instruction is splitted up into a sequence of steps;

Each step can be executed in parallel, instructions can be processed concurrently

If not pipelined, 3 cycles per instruction3-depth pipeline can retire 3 instructions in 5 cycle: 1.7 cycles per instruction

Instruction 1

Instruction 2

Instruction 3

Dependent Instructions Harm ILP

● If an instruction is dependent to result of previous instruction,it should wait until the previous one finishes execution

○ E.g., a = b + c;

d = a + b;

Instruction 1

Instruction 2

Instruction 3

In this case, instruction 2 depends on result of instruction 1(e.g., first instruction modifies opcode of next instruction)

Dependent Instructions Harm ILP

● If an instruction is dependent to result of previous instruction,it should wait until the previous one finishes execution

○ E.g., a = b + c;

d = a + b;

In this case, instruction 2 depends on result of instruction 1(e.g., first instruction modifies opcode of next instruction)7 cycles for 3 instructions: 2.3 cycles per instruction

Instruction 1

Instruction 2

Instruction 3

Instruction Reordering Helps Performance

● By reordering dependent instructions to be located in far away, total execution time can be shorten

● If the reordering is guaranteed to not change the result of the instruction sequence, it would be helpful for better performance

Instruction 1

Instruction 3

Instruction 2

instruction 2 depends on result of instruction 1(e.g., first instruction modifies opcode of next instruction)

Instruction Reordering Helps Performance

● By reordering dependent instructions to be located in far away, total execution time can be shorten

● If the reordering is guaranteed to not change the result of the instruction sequence, it would be helpful for better performance

Instruction 1

Instruction 3

Instruction 2

instruction 2 depends on result of instruction 1(e.g., first instruction modifies opcode of next instruction)By reordering instruction 2 and 3, total execution time can be shorten6 cycles for 3 instructions: 2 cycles per instruction

Reordering is Legal, Encouraged Behavior, But...

● If program causality is guaranteed, any reordering is legal

● Processors and compilers can make reordering of instructions for better IPC

Reordering is Legal, Encouraged Behavior, But...

● If program causality is guaranteed, any reordering is legal

● Processors and compilers can make reordering of instructions for better IPC

● program causality defined with single processor environment

● IPC focused reordering doesn’t aware programmer perspective performance goals such as throughput or latency

● On Multi-processor system, reordering could harm not only correctness, but also performance

https://s-media-cache-ak0.pinimg.com/originals/c2/1a/00/c21a007e0542f7da57dacd15a86d478d.jpg

Throughput or latency?

Instructions Per Cycle

Counter-intuitive Nature of Parallelism

Time is Relative (E = MC2)

● Each CPU generates their events in their time, observes effects of events in relative time

● It is impossible to define absolute order of two concurrent events;Only relative observation order is possible

CPU 1 CPU 2 CPU 3 CPU 4

Generated event 1

Generated event 2

Observed event 1

followed by event 2

I observed event 2

followed by event 1

Event Bus

Relative Event Propagation of Hierarchical Memory

● Most system equip hierarchical memory for better performance and space● Propagation speed of an event to a given core can be influenced by specific

sub-layer of memory

If CPU 0 Message Queue is busy, CPU 2 can observe an event from CPU 1 (event A) followed by an event of CPU 0 (event B)though CPU 1 observed event B before generating event A

CPU 0 CPU 1

CPU 0MessageQueue

CPU 1MessageQueue

Memory

CPU 2 CPU 3

CPU 2MessageQueue

CPU 3MessageQueue

sub-layer of memory

If CPU 0 Message Queue is busy, CPU 2 can observe an event from CPU 0 (event A) after an event of CPU 1 (event B)though CPU 1 observed event A before generating event B

CPU 0 CPU 1

CPU 0MessageQueue

CPU 1MessageQueue

Memory

CPU 2 CPU 3

CPU 2MessageQueue

CPU 3MessageQueue

Generate Event A;

Event A

sub-layer of memory

CPU 0 CPU 1

CPU 0MessageQueue

CPU 1MessageQueue

Memory

CPU 2 CPU 3

CPU 2MessageQueue

CPU 3MessageQueue

Generate Event A;

Seen Event A;Generate Event B;

Event BEvent A

sub-layer of memory

CPU 0 CPU 1

CPU 0MessageQueue

CPU 1MessageQueue

Memory

CPU 2 CPU 3

CPU 2MessageQueue

CPU 3MessageQueue

Generate Event A;

Seen Event B;

Event A

Event B

Busy… ;;;

sub-layer of memory

CPU 0 CPU 1

CPU 0MessageQueue

CPU 1MessageQueue

Memory

CPU 2 CPU 3

CPU 2MessageQueue

CPU 3MessageQueue

Generate Event A;

Seen Event B;Seen Event A;

Event B

Event A

Cache Coherency is Eventual

● It is well known that cache coherency protocol helps system memory consistency

● In actual, cache coherency guarantees eventual consistency only● Every effect of each CPU will eventually become visible on all CPUs, but

There’s no guarantee that they will become apparent in the same order on those other CPUs

http://img06.deviantart.net/0663/i/2005/112/b/6/schrodinger__s_cat___2_by_firefoxcentral.jpg

System with Store Buffer and Invalidation Queue

● Store Buffer and Invalidation Queue deliver effect of event but does not guarantee order of observation on each CPU

Store Buffer

Invalidation Queue

Memory

Store Buffer

Invalidation Queue

C-language and Multi-Processor

C-language Doesn’t Know Multi-Processor

● By the time of initial C-language development, multi-processor was rare● As a result, C-language has only few guarantees about memory operations

on multi-processor● Undefined behavior is allowed for undefined case

https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/The_C_Programming_Language,_First_Edition_Cover_(2).svg/2000px-The_C_Programming_Languag

e,_First_Edition_Cover_(2).svg.png

Compiler Optimizes Code

● Clever compilers try hard (really hard) to optimize code for high IPC(again, not for programmer perspective goals)

○ Converts small, private function to inline code○ Reorder memory access code to minimize dependency○ Simplify unnecessarily complex loops, ...

● Optimization uses term `Undefined behavior` as they want○ It’s legal, but sometimes do insane things in programmer’s perspective

● Memory access reordering of compiler based on C-standard, which doesn’t aware multi-processor system, can generate unintended program

● Linux kernel uses compiler directives and volatile keyword to enforce memory ordering

● C11 has much more improvement, though

Memory Models

Each Environment Provides Own Memory Model

● Memory Model defines how memory operations are generated, what effects it makes, how their effects will be propagated

● Each programming environment like Instruction Set Architecture, Programming language, Operating system, etc defines own memory model

○ Most modern language memory models (e.g., Golang, Rust, …) aware multi-processor

http://www.sciencemag.org/sites/default/files/styles/article_main_large/public/Memory.jpg?itok=4FmHo7M5

Each ISA Provides Specific Memory Model

● Some architectures have stricter ordering enforcement rule than others● PA-RISC CPUS is strictest, Alpha is weakest● Because Linux kernel supports multiple architectures, it defines its memory

model based on weakest one, Alpha

https://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2015.01.31a.pdf

Synchronization Primitives

● Though reordering and asynchronous effect propagation is legal, synchronization primitives are necessary to write human intuitive program

● Most memory model provides synchronization primitives like atomic instructions, memory barriers, etc

https://s-media-cache-ak0.pinimg.com/236x/42/bc/55/42bc55a6d7e5affe2d0dbe9c872a3df9.jpg

Atomic Operations

● Atomic operations is configured with multiple sub operations○ E.g., compare-and-swap, fetch-and-add, test-and-set

● Atomic operations have mutual exclusiveness○ Middle state of atomic operation execution cannot be seen by others○ It can be thought of as small critical section that protected by a global lock

● Almost every hardware supports basic atomic operations● In general, atomic operations are expensive

○ Misuse of atomic operations can severely degrade performance and scalability

http://www.scienceclarified.com/photos/atomic-mass-3024.jpg

Memory Barriers

● To allow synchronization of memory operations, memory model provides enforcement primitives, namely, memory barriers

● In general, memory barriers guaranteeeffects of memory operations issued before itto be propagated to other components (e.g., processor) in the systembefore memory operations issued after the barrier

● In general, memory barrier is expensive operation

CPU 1 CPU 2 CPU 3

READ A;WRITE B;<BARRIER>;READ C;

READ A, WRITE B,

than READ C occurred

WRITE B, READ A,

than READ C occurred

READ A and WRITE B can be reordered but READ C is guaranteed to be ordered after {READ A, WRITE B}

Linux Kernel Memory Model

● Defined by weakest architecture, Alpha○ Almost every combination of reordering is possible

● Provides rich set of atomic instructions○ atomix_xchg(), atomic_inc_return(), atomic_dec_return(), ...

● Provides CPU level barriers, Compiler level barriers, semantic level barriers○ Compiler barriers: WRITE_ONCE(), READ_ONCE(), barrier(), ...○ CPU barriers: mb(), wmb(), rmb(), smp_mb(), smp_wmb(), smp_rmb(), …○ Semantical barriers: ACQUIRE operations, RELEASE operations, …○ For detail, refer to https://www.kernel.org/doc/Documentation/memory-barriers.txt

● Because different barrier has different overhead, only necessary barrier should be used in necessary case for high performance and scalability

Case Studies

Memory Operation Reordering

● Memory Operation Reordering is totally LEGAL unless it breaks causality● Both of CPU and Compiler can do it, even in Single Processor

CPU 0 CPU 1 CPU 2

A = 1;B = 1;

while (B == 0) {} C = 1;

Z = C;X = A;assert(z == 0 || x == 1)

CPU 0 CPU 1 CPU 2

A = 1;B = 1;

while (B == 0) {} C = 1;

Z = C;X = A;assert(z == 0 || x == 1)

CPU 0 CPU 1 CPU 2

A = 1;B = 1;

while (B == 1) {} C = 1;

Z = C;X = A;assert(z == 0 || x == 1)

CPU 0 CPU 1 CPU 2

B = 1;

A = 1;

while (B == 0) {} C = 1;

Z = C;X = A;assert(z == 0 || x == 1)

CPU 0 CPU 1 CPU 2

B = 1;

A = 1;

while (B == 0) {} C = 1;

X = A;Z = C;

assert(z == 0 || x == 1)

● Memory Operation Reordering is totally LEGAL unless it breaks causality● Both of CPU and Compiler can do it, even in Single Processor● Memory barrier enforces operations specified before it appear as happened to

operations specified after it

CPU 0 CPU 1 CPU 2

A = 1;wmb();B = 1;

while (B == 0) {} mb(); C = 1;

Z = C;rmb();X = A;assert(z == 0 || x == 1)

● Memory Operation Reordering is totally LEGAL unless it breaks causality● Both of CPU and Compiler can do it, even in Single Processor● Memory barrier enforces operations specified before it appear as happened to

operations specified after it● In some architecture, even Transitivity is not guaranteed

○ Transitivity: B happened after A; C happened after B; then C happened after A

CPU 0 CPU 1 CPU 2

A = 1;wmb();B = 1;

while (B == 0) {} mb(); C = 1;

Z = C;rmb();X = A;assert(z == 0 || x == 1)

Transitivity for Scheduler and Workers

Scheduler and each workers made consensus about order

Scheduler

Worker A

Worker B

Worker Z

What time is it now?

Night!

Transitivity between Scheduler and Worker

Scheduler

Worker A

Worker B

Worker Z

Worker Z, all workers agreed that it’s night. Do bedmaking!

Transitivity between Scheduler and Worker

But, worker B and worker Z didn’t made consensus

Scheduler

Worker A

Worker B

Worker Z

Worker Z, I’m in afternoon! I didn’t tell you it’s night!

Compiler Memory BarrierCode Example

Compiler Reordering Avoidance

● Compiler can remove loop entirely

C code Assembly language code

static int the_var;void loop(void){ int i; for (i = 0; i < 1000; i++) the_var++;}

.LFB106:

.cfi_startproc

addl $1000, the_var(%rip)

.cfi_endproc

.LFE106:

● ACCESS_ONCE() is a compiler memory barrier implementation of Linux kernel● Store to the_var could not be seen by others

static int the_var;void loop(void){ int i; for (i = 0; ACCESS_ONCE(i) < 1000; i++) the_var++;}

movl the_var(%rip), %ecx

.L175:

addl $1, %eax

cmpl $999, %edx

jle .L175

movl %esi, the_var(%rip)

.L170:

rep ret

● Still, store to `the_var` not issued for every iteration

static int the_var;void loop(void){ int i; for (i = 0; ACCESS_ONCE(i) < 1000; i++) the_var++;}

movl the_var(%rip), %ecx

.L175:

addl $1, %eax

cmpl $999, %edx

jle .L175

movl %esi, the_var(%rip)

.L170:

rep ret

● volatile enforces compiler to issue memory operation as programmer want(Note that it is not enforced to do DRAM access)

● However, repetitive LOAD may harm performance

static volatile int the_var;void loop(void){ int i; for (i = 0; ACCESS_ONCE(i) < 1000; i++) the_var++;}

.L174:

movl the_var(%rip), %edx

addl $1, %edx

movl %edx, the_var(%rip)

cmpl $999, %edx

jle .L174

.L170:

rep ret

.cfi_endproc

● Complete memory barrier can help the case● Does memory access once and uses register for loop condition check

static int the_var;void loop(void){ int i; for (i = 0; i < 1000; i++) the_var++; barrier();}

.LFB106:

.L172:

addl $1, the_var(%rip)

subl $1, %eax

jne .L172

rep ret

.cfi_endproc

CPU Memory BarrierCode Example

Progress perception

● Code does issue LOAD and STORE, but…● see_progress() can see no progress because change made by a processor

propagates to other processor eventually, not immediately

static int prgrs;void do_progress(void){ prgrs++;}

void see_progress(void){ static int last_prgrs; static int seen; static int nr_seen; seen = prgrs; if (seen > last_prgrs) nr_seen++; last_prgrs = seen;}

do_progress:

addl $1, prgrs(%rip)

see_progress:

movl prgrs(%rip), %eax

jle .L193

addl $1, nr_seen.5542(%rip)

.L193:

movl %eax, last_prgrs.5540(%rip)

.cfi_endproc

Progress perception

● Read barrier and write barrier helps the situation

static int prgrs;void do_progress(void){ prgrs++; smp_wmb();}

void see_progress(void){ static int last_prgrs; static int seen; static int nr_seen; smp_rmb(); seen = prgrs; if (seen > last_prgrs) nr_seen++; last_prgrs = seen;}

do_progress:

addl $1, prgrs(%rip)

sfence

see_progress:

lfence

movl prgrs(%rip), %eax

jle .L193

addl $1, nr_seen.5542(%rip)

.L193:

movl %eax, last_prgrs.5540(%rip)

Memory Ordering of X86

Neither Loads Nor Stores Are Reordered with Likes

CPU 0 CPU 1

STORE 1 XSTORE 1 Y

R1 = LOAD YR2 = LOAD X

R1 == 1 && R2 == 0 impossible

Stores Are Not Reordered With Earlier Loads

CPU 0 CPU 1

R1 = LOAD XSTORE 1 Y

R2 = LOAD YSTORE 1 X

R1 == 1 && R2 == 1 impossible

Loads May Be Reordered with Earlier Stores to Different Locations

CPU 0 CPU 1

STORE 1 XR1 = LOAD Y

STORE 1 YR2 = LOAD X

R1 == 0 && R2 == 0 possible

Intra-Processor Forwarding Is Allowed

CPU 0 CPU 1

STORE 1 XR1 = LOAD XR2 = LOAD Y

STORE 1 YR3 = LOAD YR4 = LOAD X

R2 == 0 && R4 == 0 possible

Stores Are Transitively Visible

CPU 0 CPU 1 CPU 2

STORE 1 X R1 = LOAD XSTORE 1 Y

R1 == 1 && R2 == 1 && R3 == 0 impossible

Stores Are Seen in a Consistent Order by Others

CPU 0 CPU 1 CPU 2 CPU 3

STORE 1 X STORE 1 Y R1 = LOAD XR2 = LOAD Y

R1 == 0 && R2 == 0 && R3 == 1 && R4 == 0 impossible

X86 Memory Ordering Summary

● LOAD after LOAD never reordered● STORE after STORE never reordered● STORE after LOAD never reordered● STOREs are transitively visible● STOREs are seen in consistent order by others● Intra-processor STORE forwarding is possible● LOAD from different location after STORE may be reordered

● In short, quite reasonably strong enough● For more detail, refer to `Intel Architecture Software Developer’s Manual`

Summary

● Nature of Parallel Land is counter-intuitive○ Cannot define order of events without interaction○ Ordering rule is different for different environment○ Memory model defines their ordering rule○ In short, they’re all mad here

● For human-intuitive and correct program, interaction is necessary○ Every memory model provides synchronization primitives like atomic instruction and memory

barrier, etc○ Such interaction is expensive in common

● Linux kernel memory model is based on weakest memory model, Alpha○ Kernel programmers should assume Alpha when writing architecture independent code

○ Because of the expensive cost of synchronization primitives, programmer should use only necessary primitives on necessary location

Thanks

This work by SeongJae Park is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported

License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.

These Slides Has Been Presented For

● SW Maestro 100+ Conference 2016 (http://onoffmix.com/event/57240)● KOSSCON 2016 (https://kosscon.kr/2016)

Understanding of linux kernel memory model

Software

Transcript of Understanding of linux kernel memory model

Analyzing and Improving Linux Kernel Memory Protection

Linux Kernel architecture for device drivers - Enixthomas.enix.org/pub/conf/rmll2010/kernel-architecture-for-drivers.pdf · Linux Kernel architecture for device drivers Linux Kernel

Linux Kernel Library - Reusing Monolithic Kernel

Embedded Linux kernel and driver development€¦ · Embedded Linux kernel and ... Kernel, drivers and embedded Linux development, consulting, training and ... the kernel Makefile

Evaluating Linux Kernel Crash Dumping Mechanismslkdtt.sourceforge.net/docs/ols2006_lkdtt.pdf · 5. kexec-based crash dumping Memory dump kernel crash kdump configuration 1. crash

1 INTRODUCTION IJSER · 2016-09-09 · 2.1 Linux kernel . The first section is the Linux kernel. This provides basic sys-tem functionality like process management, memory man-agement,

Memory Addressing in Linux (Chap. 2 in Understanding the Linux Kernel)

The Linux Kernel Module Programming Guide · 3.1.3. User Space vs Kernel Space ... The Linux Kernel Module Programming Guide * * */ The Linux Kernel Module Programming Guide {} {}

Basic Linux kernel

Linux-Kernel Memory Ordering Workshop...2017/09/15 · 3 © 2017 IBM Corporation Linux Kernel Memory Ordering Overview, September 15, 2017 Purpose of the Linux Kernel Memory Model

Linux-Kernel Memory Ordering: Help Arrives At Last! Memory Ordering: Help Arrives At Last! ... 8 © 2017 IBM Corporation linux.conf.au Linux Kernel Memory Ordering, ... –Core-concurrent-code

Introduction to Microkernel- Based Operating Systemsos.inf.tu-dresden.de/~doebel/downloads/01-Intro.pdf · Monolithic kernels - Linux Linux Kernel Processes Scheduling IPC Memory

Linux kernel and driver development training Lab Bookzeuzoix.github.io/.../free_electrons_linux_kernel/linux-kernel-labs.pdf · Free Electrons Linux kernel and driver development

Linux Kernel and Performance Tools support for AMD …cscads.rice.edu/linux-kernel-amd-pmu-support.pdf · Files: kernel/events/* include/linux/perf_event.h ... Linux kernel support

Patching the Linux Kernel and U-Boot for Micron® M29 Flash Memory

Proactive memory error detection for the Linux kernel

Linux Kernel Tour

There's a Party at Ring0 - cr0 · The Linux kernel as a local target The Linux kernel has been a target for over a decade Memory / memory management corruption vs. logical bug The

Memory Barriers in the Linux Kernel

Linux-Kernel Memory Ordering: Help Arrives At Last!€¦ · 2017-04-08 · Beaver Barcamp Linux Kernel Memory Ordering, April 8, 2017 But memory-barrier.txt is Incomplete! (The memory-barriers.txt