Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating...

19
Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    228
  • download

    0

Transcript of Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating...

Page 1: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Improving IPC by Kernel Design

Jochen LiedtkeProceeding of the 14th ACM Symposium on Operating

Systems PrinciplesAsheville, North Carolina

1993

Page 2: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

The Performance ofu-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Proceedings of the 16th Symposium on Operating Systems Principles

October 1997, pp. 66-77

Page 3: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Jochen Liedtke (1953 – 2001)

• 1977 – Diploma in Mathematics from University of Beilefeld.

• 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles.

• 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel.

Page 4: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

The IPC Dilemma

• IPC is a core paradigm of u-kernel architectures• Most IPC implementations perform poorly • Really fast message passing systems are needed to

run device drivers and other performance critical components at the user-level.

• Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture

Page 5: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

What to Do?

• Optimize IPC performance above all else!• Results: L3 and L4: second-generation micro-

kernel based operating systems • Many clever optimizations, but no single “silver

bullet”

Page 6: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Summary of Techniques

Seventeen Total

Page 7: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Standard System Calls (Send/Recv)

send ( ); System call, Enter kernel Exit kernel

Client (Sender) Server (Receiver)

receive ( ); System call, Enter kernel Exit kernel

send ( ); System call, Enter kernel Exit kernel

receive ( ); System call, Enter kernel Exit kernel

Client is not Blocked

Kernel entered/exited four times per call!

Page 8: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

New Call/Response-based System Calls

call ( ); System call, Enter kernel Allocate CPU to Server Suspend

Re allocate CPU to Client Exit kernel

Client (Sender) Server (Receiver)

Resume from being suspended Exit kernel

reply_and_recv_next ( ); Enter kernel Send Reply Wait for next message

handle message

Special system calls for RPC-style interaction

Kernel entered and exited only twice per call!

reply_and_recv_next ( );

Page 9: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Complex Message Structure

Batching IPC

Combine a sequence of send operations into a single operation by supporting complex messages

• Benefit: reduces number of sends.

Page 10: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Direct Transfer by Temporary Mapping

• Naïve message transfer: copy from sender to kernel then from kernel to receiver

• Optimizing transfer by sharing memory between sender and receiver is not secure

• L3 supports single-copy transfers by temporarily mapping a communication window into the sender.

Page 11: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Scheduling

• Conventionally, ipc operations call or reply & receive require scheduling actions:– Delete sending thread from the ready queue.

– Insert sending thread into the waiting queue

– Delete the receiving thread from the waiting queue.

– Insert receiving thread into the ready queue.

• These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).

Page 12: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Solution, Lazy Scheduling

• Don’t bother updating the scheduler queues!

• Instead, delay the movement of threads among queues until the queues are queried.

• Why?– A sending thread that blocks will soon unblock again, and maybe

nobody will ever notice that it blocked

• Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks

Page 13: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Pass Short Messages in Registers

• Most messages are very short, 8 bytes (plus 8 bytes of sender id)– Eg. ack/error replies from device drivers or

hardware initiated interrupt messages.

• Transfer short messages via cpu registers.

• Performance gain of 2.4 us or 48%T.

Page 14: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Impact on IPC Performance

• For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement.

• For large message (4K) a 3 fold improvement is seen.

Page 15: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Relative Importance of Techniques

• Quantifiable impact of techniques– 49% means that that removing that item would increase ipc time

by 49%.

Page 16: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

OS and Application-Level Performance

Page 17: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

OS-Level Performance

Page 18: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Application-Level Performance

Page 19: Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993.

Conclusion

• Use a synergistic approach to improve IPC performance– A thorough understanding of hardware/software

interaction is required– no “silver bullet”

• IPC performance can be improved by a factor of 10

• … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS– L4-based Linux outperforms Mach-based Linux, but

not monolithic Linux