Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University.
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University.
Improving IPC by Kernel DesignJochen Liedtke
Shane MatthewsPortland State University
3/12/2004 Portland State University
Summary
• Review
• Performance improved
– Architecture Level
– Algorithmic Level
– Interface Level
– Coding Level
3
Micro-kernels
• Minimal OS, providing a set of primitives used to implement thread/address space management and IPC [1]
• Everything else is moved to user-space (servers)
4
Terminology (L3)
• Dataspace– Memory object, mapped into address space
• Task– Composed of threads, dataspaces, and an address space
• Message– String/memory object
5
L3 Architecture & IPC
• Active components communicate via messages
• Applies to:– Device drivers
• Implemented as user level tasks
– Hardware Interrupts• Interrupt message from micro-kernel to thread
6
L3 Redesign Principles
• IPC performance is the master– Security and performance must not be affected
• Synergetic effects taken into consideration– (Think combined effects)– May lead to reinforcement or diminution
• Design must aim at performance goal– Per short message transfer– 350 cycles (7 micro-seconds)
3/12/2004 Portland State University
Architectural Level
• Messages
• Process Structure
• Control Blocks
3/12/2004 Portland State University
Compound Messages
• Multiple send/receive -> 1 send/receive
• Messages consists of direct/indirect strings, and memory objects
9
Twofold message copy
• [A space] -> [kernel] -
> [B space]
• O(20 + .75n) cycles,
n:= bytes
• Good for small
messages
• Need something better
as n grows
10
LRPC and SRC RPC
• Client/server share user level memory– sender -> shared buffer
• Problems– When server to client is 1 to many, shared
regions of address space become critical resources
– Shared regions require explicit opens (unlike L3)
– Message change during/after checking
11
Direct Message Copy Via Windows
• L3's method
– Destination mapped
into window
– Message copied to
window
• Window
– per address space
– Accessed exclusivly
by kernel
12
Communication Windows
• Problems
– Must be fast
– Different threads
coxisting within
address space
• L3 Implementation
– One word page
directory B to A.
13
Process Structure
• Threads running kernel mode have 1 kernel
stack per thread
– Efficient since interupts, page faults, IPC,
already save state on kernel stack
• Continuations
– Pro: • Reduce kernel stack
– Cons: • Require additional copies between kernel and
continutation
• Interfere with other optimizations
14
Tread Control Blocks
• Implemented as large array in kernel
– fast tcb access
• Array base + tcb # + tcb size
– Saves TLB misses (IPC)
• kernel stacks of sender and reciever located in TCB
page
– Locking done via unmapping on TCB
3/12/2004 Portland State University
Algorithmic Level
• Thread Identifier
• Lazy Scheduling
• Short Messages Via Registers
3/12/2004 Portland State University
Thread Identifier
• Thread addressed by 64-bit UID in user-
mode
• Thread number in lower 32-bits of UID
– AND with bit mask, add to TCB’s array base
3/12/2004 Portland State University
Lazy Scheduling
• IPC operation call or reply & receive next
– Delete sending thread from ready queue
– Insert into waiting queue
– Delete receiving thread from waiting queue
– Insert into ready queue
• Too many queue operations!
3/12/2004 Portland State University
Lazy Scheduling cont.
• L3 queue invariants
– Ready queue contains all ready threads
– Waiting queue contains at least all threads
waiting
• TCB contains threads state (ready/waiting)
• Scheduler removes all threads not
belonging to queue during queue parsing
3/12/2004 Portland State University
Short Messages Via Registers
• High proportion of messages are short
– Ex. Driver ack/error, hardware interrupts
• 486
– 7 general registers
– 3 needed: sender ID, result code
– 4 available
• 8-byte messages using coding scheme
3/12/2004 Portland State University
Interface Level
• Simple RPC stubs
– Load registers, system call, check success
– Compiler generates stubs inline
• Parameter Passing
– Use registers when possible
3/12/2004 Portland State University
Coding Level
• Reduce cache and TLB misses
– Short kernel code
• Short jumps, use registers, short address
displacements
– IPC kernel code in one page
– Handle save/restore of coprocessor lazily
• Delayed until different thread needs to use it
3/12/2004 Portland State University
Results
• 100% would indicate double the time increase
• Removal of all increase IPC time by 134% for 8 byte message
3/12/2004 Portland State University
Results
• L3 VS Mach
• System– Intel 486 DX-50– 256 KB external
cache– 16 MB memory
3/12/2004 Portland State University
Results cont.
3/12/2004 Portland State University
Conclusions
• IPC improved by applying
– Performance based reasoning
– Synergetic effects
– Architecture -> coding
26
References
• [1] http://en.wikipedia.org/wiki/Micro_kernel
• [2] Improving IPC by Kernel Design - Jochen Liedtke