Lightweight Remote Procedure Call Brian N. Bershad , Thomas E. Anderson, Edward D. Lazowska , and...
description
Transcript of Lightweight Remote Procedure Call Brian N. Bershad , Thomas E. Anderson, Edward D. Lazowska , and...
Lightweight Remote Procedure Call
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy
Presented by Alana Sweat
Outline• Introduction• RPC refresher• Monolithic OS vs. micro-kernel OS
• Use and Performance of RPC in Systems• Cross-domain vs. cross-machine• Problems with traditional RPC used for cross-domain RPC
• Lightweight RPC (LRPC)• Implementation• Performance
• Conclusion
Introduction
What is an RPC?
http://www-01.ibm.com/software/network/dce/library/publications/appdev/html/APPDEV20.HTM
An inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space without the programmer explicitly coding the details for this remote interaction http://en.wikipedia.org/wiki/Remote_procedure_call
Monolithic kernel & Micro-kernel OSs
http://en.wikipedia.org/wiki/Monolithic_kernel
Monolithic kernel OS• Advantages• All parts of kernel have easy access to hardware• Easy communication between kernel threads due to shared address space
• Disadvantages• Increasingly complex code as kernel grows, difficult to isolate problems and
add/remove/modify code• Large amount of code having direct access makes hardware more vulnerable
Micro-kernel OS• Advantages• Since modules are in user space, relatively easy to add/remove/modify
functionality to operating system• Hardware is only accessed directly by small amount of protected kernel code• Completely separate modules helps with isolating problems & debugging• Each module in its own “protection domain”, since can only access its own
address space
• Disadvantages• User-level modules must interact with each other over separate address
spaces, difficult to achieve good performance
Use and Performance of RPC in Systems
Cross-domain RPC (local RPC)• Local remote procedure call• Remote since it accessing a “remote” address space, local because it is a
procedure call on the same machine
• General RPC model used for inter-process communication (IPC) in micro-kernel systems
Comparatively, how often does a system execute cross-machine RPC vs. cross-domain RPC?
*Measured over 5-hr period on work day for Taos, over 4 days for Sun workstation
Size and complexity of cross-domain RPCs• Survey includes 28 RPC services defining 366 procedures w/ 1000+
parameters over four-day period using SRC RPC on Taos OS
Why not just use standard RPC implementation for cross-domain calls?
Overhead in cross-domain RPC• Stub overhead
• execution path is general, but much code in path is not needed for cross-domain
• Message Buffer management• Allocate buffers; copies to kernel and back
• Access validation• Kernel validates message sender on call and again on return
• Message transfer• Enqueue/dequeue messages
• Scheduling• Programmer sees one abstract thread crossing domains; kernel has threads fixed in their own domain signaling each
other
• Context switch• Swap virtual memory from client’s domain to server’s domain and back
• Dispatch• Receiver thread in server domain interprets message and dispatches thread to execute the call
Lightweight RPC (LRPC)
What is LRPC?• Modified implementation of RPC optimized for cross-domain calls• Execution model borrowed from protected procedure call• Call to server procedure made by kernel trap• Kernel validates caller, creates a linkage, dispatches client’s thread directly to
server domain• Client provides server with argument stack along with thread
• Programming semantics borrowed from RPC• Servers execute in private protection domain & export 1+ interfaces• Client binds to server interface before starting to make calls• Server authorizes client by allowing binding to occur
Implementation Details• Binding• Kernel allocates A-stacks (argument stacks) in both client and server domains
for each procedure in the interface which are shared & read/write• Procedures can share A-stacks (if of similar size) to reduce storage needs• Kernel creates linkage record for each A-stack allocated to record caller’s
return address (kernel accessible only)• Kernel returns Binding Object containing key for accessing server’s interface &
A-stack list (for each procedure) to client
Implementation Details• Client calls into stub, which:• Takes A-stack off of stub-managed A-stack queue & pushes client’s arguments onto it• Puts address of A-stack, binding object, & procedure ID into registers• Traps to the kernel
• Kernel then:• Verifies binding object, procedure ID, A-stack & linkage• Records caller’s return address and stack pointer in the linkage• Updates thread’s user stack pointer to run off an Execution stack (E-stack) in the
server’s domain & reloads processor’s virtual memory registers with those of server domain
• Does an upcall into the server’s stub to execute the procedure
Implementation Details• Returning• Server procedure returns through its own stub• No need to verify Binding Object, procedure identifier, and A-stack (already in
the linkage and not changed by server return call)• A-stack contains procedure’s return values
Optimizations• Separate code paths for cross-machine vs. cross domain calls, and distinction
made from first instruction executed in stub• Keep E-stacks allocated and associated with A-stacks, only allocate new E-
stack when none unassociated available• Each A-stack queue (per procedure) has its own lock, so minimum contention
in multi-threaded scenario• In multiprocessor systems, kernel caches domain contexts for idle processors
• After LRPC call is made, kernel checks for processor idling in context of server domain• If found, kernel exchanges processors of calling & idling threads, & server procedure
can execute without requiring context switch
A Note about A-stacks and E-stacks• Modula2+ language has the convention that procedure calls use a
separate argument pointer instead of requiring the arguments be pushed onto the execution stack• Different threads cannot share E-stacks, but because of the
convention used it is safe to share A-stacks• If LRPC was implemented in a language where E-stacks have to
contain arguments (such as C), the optimization of shared A-stacks would not be possible (thus arguments would need extra copies)
Performance of LRPC• Ran on Firefly using LRPC & Taos RPC• 100,000 cross domain calls in tight loop, averaged time• LRPC/MP uses idle processor domain caching, LRPC does context switch on every call on single
processor
Conclusion
Conclusion• Cross-domain RPC calls are significantly more common than cross-
machine RPC calls• Significant amount of extra overhead in standard RPC execution path
when used for cross-domain calls• LRPC eliminates many sources of overhead by creating a separate
version of RPC that is optimized for cross-domain calls (arguably the common case of RPC)• LRPC was shown to improve cross-domain RPC performance by a
factor of 3 (in the Firefly/Taos system) over Taos RPC