Improving IPC by Kernel Design Jochen Liedtke
description
Transcript of Improving IPC by Kernel Design Jochen Liedtke
![Page 1: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/1.jpg)
Improving IPC by Kernel DesignJochen Liedtke
Shane MatthewsPortland State University
![Page 2: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/2.jpg)
3/12/2004 Portland State University
Summary
• Review• Performance improved
– Architecture Level– Algorithmic Level– Interface Level– Coding Level
![Page 3: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/3.jpg)
3
Micro-kernels• Minimal OS, providing
a set of primitives used to implement thread/address space management and IPC [1]
• Everything else is moved to user-space (servers)
![Page 4: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/4.jpg)
4
Terminology (L3)• Dataspace
– Memory object, mapped into address space• Task
– Composed of threads, dataspaces, and an address space• Message
– String/memory object
![Page 5: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/5.jpg)
5
L3 Architecture & IPC• Active components communicate via
messages• Applies to:
– Device drivers• Implemented as user level tasks
– Hardware Interrupts• Interrupt message from micro-kernel to thread
![Page 6: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/6.jpg)
6
L3 Redesign Principles• IPC performance is the master
– Security and performance must not be affected• Synergetic effects taken into consideration
– (Think combined effects)– May lead to reinforcement or diminution
• Design must aim at performance goal– Per short message transfer– 350 cycles (7 micro-seconds)
![Page 7: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/7.jpg)
3/12/2004 Portland State University
Architectural Level
• Messages• Process Structure• Control Blocks
![Page 8: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/8.jpg)
3/12/2004 Portland State University
Compound Messages• Multiple
send/receive -> 1 send/receive
• Messages consists of direct/indirect strings, and memory objects
![Page 9: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/9.jpg)
9
Twofold message copy
• [A space] -> [kernel] -> [B space]
• O(20 + .75n) cycles, n:= bytes
• Good for small messages
• Need something better as n grows
![Page 10: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/10.jpg)
10
LRPC and SRC RPC• Client/server share user level memory
– sender -> shared buffer• Problems
– When server to client is 1 to many, shared regions of address space become critical resources
– Shared regions require explicit opens (unlike L3)– Message change during/after checking
![Page 11: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/11.jpg)
11
Direct Message Copy Via Windows
• L3's method– Destination mapped
into window– Message copied to
window
• Window– per address space– Accessed exclusivly
by kernel
![Page 12: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/12.jpg)
12
Communication Windows• Problems
– Must be fast– Different threads
coxisting within address space
• L3 Implementation– One word page
directory B to A.
![Page 13: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/13.jpg)
13
Process Structure• Threads running kernel mode have 1 kernel
stack per thread– Efficient since interupts, page faults, IPC,
already save state on kernel stack• Continuations
– Pro: • Reduce kernel stack
– Cons: • Require additional copies between kernel and
continutation• Interfere with other optimizations
![Page 14: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/14.jpg)
14
Tread Control Blocks• Implemented as large array in kernel
– fast tcb access• Array base + tcb # + tcb size
– Saves TLB misses (IPC)• kernel stacks of sender and reciever located in TCB
page
– Locking done via unmapping on TCB
![Page 15: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/15.jpg)
3/12/2004 Portland State University
Algorithmic Level• Thread Identifier
• Lazy Scheduling
• Short Messages Via Registers
![Page 16: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/16.jpg)
3/12/2004 Portland State University
Thread Identifier
• Thread addressed by 64-bit UID in user-mode
• Thread number in lower 32-bits of UID– AND with bit mask, add to TCB’s array base
![Page 17: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/17.jpg)
3/12/2004 Portland State University
Lazy Scheduling
• IPC operation call or reply & receive next– Delete sending thread from ready queue– Insert into waiting queue– Delete receiving thread from waiting queue– Insert into ready queue
• Too many queue operations!
![Page 18: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/18.jpg)
3/12/2004 Portland State University
Lazy Scheduling cont.
• L3 queue invariants– Ready queue contains all ready threads– Waiting queue contains at least all threads
waiting• TCB contains threads state (ready/waiting)• Scheduler removes all threads not
belonging to queue during queue parsing
![Page 19: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/19.jpg)
3/12/2004 Portland State University
Short Messages Via Registers
• High proportion of messages are short– Ex. Driver ack/error, hardware interrupts
• 486– 7 general registers– 3 needed: sender ID, result code– 4 available
• 8-byte messages using coding scheme
![Page 20: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/20.jpg)
3/12/2004 Portland State University
Interface Level
• Simple RPC stubs– Load registers, system call, check success– Compiler generates stubs inline
• Parameter Passing– Use registers when possible
![Page 21: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/21.jpg)
3/12/2004 Portland State University
Coding Level
• Reduce cache and TLB misses– Short kernel code
• Short jumps, use registers, short address displacements
– IPC kernel code in one page– Handle save/restore of coprocessor lazily
• Delayed until different thread needs to use it
![Page 22: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/22.jpg)
3/12/2004 Portland State University
Results• 100% would indicate
double the time increase
• Removal of all increase IPC time by 134% for 8 byte message
![Page 23: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/23.jpg)
3/12/2004 Portland State University
Results• L3 VS Mach• System
– Intel 486 DX-50– 256 KB external
cache– 16 MB memory
![Page 24: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/24.jpg)
3/12/2004 Portland State University
Results cont.
![Page 25: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/25.jpg)
3/12/2004 Portland State University
Conclusions
• IPC improved by applying– Performance based reasoning– Synergetic effects– Architecture -> coding
![Page 26: Improving IPC by Kernel Design Jochen Liedtke](https://reader033.fdocuments.us/reader033/viewer/2022061612/56815d16550346895dcb1799/html5/thumbnails/26.jpg)
26
References• [1]
http://en.wikipedia.org/wiki/Micro_kernel• [2] Improving IPC by Kernel Design -
Jochen Liedtke