Fast Communication

download Fast Communication

If you can't read please download the document

description

Fast Communication. Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy. Why Remote Procedure Call?. Simplify building distributed systems and applications Looks like local procedure call Transparent to user Balance between semantics and efficiency - PowerPoint PPT Presentation

Transcript of Fast Communication

  • Fast CommunicationFirefly RPCLightweight RPCCS 614Tuesday March 13, 2001Jeff Hoy

  • Why Remote Procedure Call?Simplify building distributed systems and applicationsLooks like local procedure callTransparent to userBalance between semantics and efficiencyUniversal programming toolSecure inter-process communication

  • RPC ModelClient ApplicationClient StubClient RuntimeServer ApplicationServer StubServer RuntimeNetworkCallReturn

  • RPC In Modern ComputingCORBA and Internet Inter-ORB Protocol (IIOP)Each CORBA server object exposes a set of methodsDCOM and Object RPCBuilt on top of RPCJava and Java Remote Method Protocol (JRMP)Interface exposes a set of methodsXML-RPC, SOAPRPC over HTTP and XML

  • GoalsFirefly RPCInter-machine CommunicationMaintain Security and FunctionalitySpeedLightweight RPCIntra-machine CommunicationMaintain Security and FunctionalitySpeed

  • Firefly RPCHardwareDEC Firefly multiprocessor1 to 5 MicroVAX CPUs per nodeConcurrency considerations10 megabit EthernetTakes advantage of 5 CPUs

  • Fast Path in a RPCTransport MechanismsIP / UDPDECNet byte streamShared Memory (intra-machine only)Determined at bind timeInside transport procedures Starter, Transporter, Ender, and Receiver for the server

  • Caller StubGets control from calling programCalls Starter for packet bufferCopies arguments into the bufferCalls Transporter and waits for replyCopies result data onto callers result variablesCalls Ender and frees result packet

  • Server StubReceives incoming packetCopies data into stack, a new data block, or left in the packetCalls server procedureCopies result into the call packet and transmit

  • Transport MechanismTransporter procedureCompletes RPC headerCalls Sender to complete UDP, IP, and Ethernet headers (Ethernet is the chosen means of communication)Invoke Ethernet driver via kernel trap and queue the packet

  • Transport MechanismReceiver procedureServer thread awakens in ReceiverReceiver calls the stub interface included in the received packet, and the interface stub calls the procedure stubReply is similar

  • ThreadingClient Application creates RPC threadServer Application creates call thread Threads operate in server applications address spaceNo need to spawn entire processThreads need to consider locking resources

  • Threading

  • Performance EnchancementsOver traditional RPCStubs marshal arguments rather than library functions handling argumentsRPC procedures called through procedure variables rather than by lookup tableServer retains call packet for resultsBuffers reside in shared memorySacrifices abstract structure

  • Performance AnalysisNull() ProcedureNo arguments or return valueMeasures base latency of RPC mechanism

    Multi-threaded caller and server

  • Time for 10,000 RPCsBase latency 2.66msMaxResult latency (1500 bytes) 6.35ms

  • Send and Receive Latency

  • Send and Receive LatencyWith larger packets, transmission time dominatesOverhead becomes less of an issueGood for Firefly RPC, assuming large transmission over networkIs overhead acceptable for intra-machine communication?

  • Stub LatencySignificant overhead for small packets

  • Fewer ProcessorsSeconds for 1,000 Null() calls

  • Fewer ProcessorsWhy the slowdown with one processor?Fast path can be followed only in multiprocessor environmentLock conflicts, scheduling problemsWhy little speedup past two processors?

  • Future ImprovementsHardwareFaster network will help larger packetsTriple CPU speed will reduce Null() time by 52% and MaxResult by 36%SoftwareOmit IP and UDP headers for Ethernet datagrams, 2~4% gainRedesign RPC protocol ~ 5% gainBusy thread wait, 10~15% gainWrite more in assembler, 5~10% gain

  • Other ImprovementsFirefly RPC handles intra-machine communication through the same mechanisms as inter-machine communicationFirefly RPC also has very high overhead for small packetsDoes this matter?

  • RPC Size DistributionMajority of RPC transfers under 200 bytes

  • Frequency of Remote ActivityMost calls are to the same machine

  • Traditional RPCMost calls are small messages that take place between domains of the same machineTraditional RPC contains unnecessary overhead, likeSchedulingCopyingAccess validation

  • Lightweight RPC (LRPC)Also written for the DEC Firefly systemMechanism for communication between different protection domains on the same systemSignificant performance improvements over traditional RPC

  • Overhead AnalysisTheoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to returnTheoretical minimum on Firefly RPC: 109 us.Actual cost: 464us

  • Sources of Overhead355us addedStub overheadMessage buffer overheadNot so much in Firefly RPCMessage transfer and flow controlScheduling and abstract threadsContext Switch

  • Implementation of LRPCSimilar to RPCCall to server is done through kernel trap Kernel validates the callerServers export interfacesClients bind to server interfaces before making a call

  • BindingServers export interfaces through a clerkThe clerk registers the interfaceClients bind to the interface through a call to the kernelServer replies with an entry address and size of its A-stackClient gets a Binding Object from the kernel

  • CallingEach procedure is represented by a stubClient makes a call through the stubManages A-stacksTraps to the kernelKernel switches context to the serverServer returns by its own stubNo verification needed

  • Stub GenerationProcedure representationCall stub for clientEntry stub for serverLRPC merges protocol layersStub generator creates run-time stubs in assembly languagePortability sacrificed for Performance Falls back on Modula2+ for complex calls

  • Multiple ProcessorsLRPC caches domains on idle processorsKernel checks for an idling processor in the server domainIf a processor is found, caller thread can execute on the idle processor without switching context

  • Argument CopyingTraditional RPC copies arguments four times for intra-machine callsClient stub to RPC message to kernels message to servers message to servers stackIn many cases, LRPC needs to copy the arguments only onceClient stub to A-stack

  • Performance AnalysisLRPC is roughly three times faster than traditional RPCNull() LRPC cost: 157us, close to the 109us theoretical minimumAdditional overhead from stub generation and kernel execution

  • Single-Processor Null() LRPC

  • Performance ComparisonLRPC versus traditional RPC (in us)

  • Multiprocessor Speedup

  • Inter-machine CommunicationLRPC is best for messages between domains on the on the same machineThe first instruction of the LRPC stub checks if the call is cross-machineIf so, stub branches to conventional RPCLarger messages are handled well, LRPC scales by packet size linearly like traditional RPC

  • CostLRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocolsAbstraction is sacrificed for functionalityRPC is built into operating systems (Linux DCE RPC, MS RPC)

  • ConclusionFirefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough?The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate (1990)Is speed still an issue?