Customizing Middleware to Improve Performance and Footprint
description
Transcript of Customizing Middleware to Improve Performance and Footprint
Customizing Middleware to Customizing Middleware to Improve Improve
Performance and FootprintPerformance and Footprint Arvind S. Krishna
Institute for Software Integrated Systems
Vanderbilt University Nashville, Tennessee
Motivation (1/2)
Where are we right now?•Maturation of Distributed Object Computing Middleware (DOC)
•ACE+TAO middleware
• Open-source implementation of CORBA and Real-time CORBA
• Highly optimized implementation implementing almost all features of CORBA
•From Stovepiped to reusable architectures
Middleware
MiddlewareServices
Applications
Operating Sys& Protocols
Hardware & Networks
Functionality factored in middleware
Product Line Architectures• Set of Systems that share
common “core features”• Families of systems then built
using core features• Reduce time to market
pressures, cost productivity etc• Example: Boeing Bold Stroke
Architecture
Product line architectures minimize cost for building variants
Motivation (2/2)
Model Driven Development Paradigm (MDD)• Reduces costs of building new families of systems• Compose different systems at modeling level
• Model Check for correctness• Code-generators synthesize artificats: XML deployment
information, configuration information, benchmarking code…..
Models capture System properties: structure and behavior
Models capture System properties: structure and behavior
What we need?Optimizations that customize middleware based on system invariants
What we need?Optimizations that customize middleware based on system invariants
Information propagationMiddleware for Product-Lines •Still general purpose layered
•Enables different variants to be hosted by different configurations
•However not optimized for each variant
Customizing Middleware via Partial Evaluation
Partial Evaluation•Technique of automatically specializing programs based on ahead of time known parameters
• Two level mechanism:• First level annotating information• Second level involves synthesizing code
• Templates and Template meta-programming
RUN TIMEOPTIONS
Configurationparameters
COMPILE-TIMEOPTIONS
Type of stub,skeleton….
General Purpose Layered Architecture Optimized Implementation Stack
Research will examine•Techniques used in programming languages can be used in middleware
•Move from a general purpose to a more specialized architecture
Optimize the “known knowns” leave “known uknowns” to the middleware and use exceptions for “unknown unknowns”
Optimize the “known knowns” leave “known uknowns” to the middleware and use exceptions for “unknown unknowns”
Existing Middleware Optimizations• Footprint Reduction
Optimization•Micro ORB Architecture Virtual Component Pattern
• Micro POA Architecture Pluggable components
• Request Demux/Dispatch Optimizations•Connection Management Acceptor-Connector pattern, Reactor
•Buffer Management Strategies•Request Demultiplexing Active Demultiplexing & Perfect Hashing
Aren’t these optimizations enough?• Have worked really well for different applications in domains• General purpose middleware is still layered• Techniques that will fold layers (code and run-time checks) to improve
performance• Will add more to the general purpose optimizations
Aren’t these optimizations enough?• Have worked really well for different applications in domains• General purpose middleware is still layered• Techniques that will fold layers (code and run-time checks) to improve
performance• Will add more to the general purpose optimizations
Capturing System Invariants in Models (1/2)
Example System• Basic Simple (BasicSP) three component
Distributed Real-time Embedded (DRE) application scenario
• Timer Component – triggers periodic refresh rates
• GPS Component – generates periodic position updates
• Airframe Component – processes input from the GPS component and feeds to Navigation display
• Navigation Display – displays GPS position updates
ACE_wrappers/TAO/CIAO/DaNCE/examples/BasicSPCoSMIC/examples/BasicSP
Hypothesis Solution ApproachUse early binding parameters to tailor middleware
Techniques applied could range from:•Conditional Compilation•Optimize/Stub skeleton generation•Strategy pattern to handle alternatives
Program Specialization InvariantsMust hold for all specializations•output(porig) = output (pspl)• speed (pspl) > speed(porig)Boeing Product line scenario –
Representative DRE application: rate based
Capturing System Invariants in Models (1/2)
Mapping Ahead of Time (AOT) System Properties to Specializations• Periodicity Pre-create marshaled Request
• Single Interface Operations Pre-fetch POA, Servant, Skeleton servicing request
• Same Endianess Avoid de-marshaling (byte order swapping)
• Collocated Components Specialize for target location (remove remoting)
• Same operation invoked Cache CORBA Request header/update arguments only
Collocated Components
Collocated Components
Same Endianes
s
Same Endianes
sPeriodic Timer
Periodic Timer Single
method interfaces
Single method
interfaces
Component Interactions Component Deployment
Specializations Implemented in TAO
ClientOBJREF
Object(Servant)
in argsoperation()out args +
return
IDLSTUBS
ORBINTERFACE
IDLSKEL
Object Adapter
ORB CORE GIOP/ IIOP/ ESIOPS
2
3
5
4
1
1
2
3
4
5
Specialization on Location
Request HeaderCaching
Eliminate un-necessarychecks
Pre-create Request
Optimize for TargetLocation
Client Side Specialization• Request Header Caching• Pre-creating Requests• Marshaling checks• Target Location
Server Side Specialization•Specialize Request Processing•Avoid Demarshaling checks
Cumulative Effect•More than additive increase of adding specializations
•For example:• Client side – request caching• Server side – specialize
request processing• 1+1 = 3?
Specialize for Target Location (1/2)
IntentSpecialize a path based on knowledge that objects are collocated
Model Invariants• All communication between GPS, Airframe and Display components are collocated
• All Invocations are local• Do not need remoting code (Connection code not required)
Transformations to TAO (foot-print)
• Eliminate Connection handling code• Connection Strategies, Flushing
Strategies• Eliminate Invocation classes
• Remote Invocation classes • One way and two way invocation
classes
Transformations to TAO (performance)
• Eliminate Remoting Checks• Object Proxy checks for remoting• Invocation Adapter checks for
remoting for each invocation• Checks for one-way or two-way
invocation
Address Space
Client
Server
NETWORK
Specialize for Target Location (2/2)
Configuration• 2.4.21-27.0.1.ELsmp #1 SMP
Redhat kernel• Athlon dual processor 2 GHz
processor• 1 GB RAM and 256 KB cache for
each processor• Test run TAO’s performance-
tests/Latency/Collocation
TAO Implementation & Automation• All implementations present in branch
“TAO_PE_Collocation”
• Specialization implemented by Conditional compilation technique (TAO_HAS_COLLOCATION) flag to remove remoting
• Profiled optimistic case of absolute no remoting (i.e. no code to handle requests and replies)
Optimization Performance Improvements
CORBA Compliance & Automation
•Code subsetting – removed connection related code
• Performance – elimination of remoting checks
• libTAO ~ 6% (100 kB of reduction)
•Application ~ 15 %
• Improved by 10 % (over and above Thru_POA) collocation
•Compliant with CORBA specification
•Realized by macros
• Invocation classes can be separated out as libraries
Specialize CORBA Request Header (1/4)
IntentAvoid the considerable overhead of creating new CORBA requests and replies for each of a series of request calls
Model Invariants• Timer Component periodically sends same event
• Operations to retrieve data from the models are also the same.
Update Rather than Create• Do not create new Request each time
• Use old request and re-use the Request Header
• Various levels of re-use possible• Reuse only Request Header• Reuse both Request Header +
Message Specific Header• Reuse entire request
This approach similar to TCP header prediction
This approach similar to TCP header prediction
Specialize Request Header (2/4)
TAO Implementation• First request creates the entire request (code flow same as normal path)
• Cache header information (marshaled)
• Update only the total size and ID after request creation on subsequent messages
• Implemented via conditional compilation
Request Header Caching• First level specialization – Cache only
the Request Header Part• Everything else in the request is
variable• Avoid marshaling de-marshaling costs
for the header part alone• Implemented at client side
Optimization Performance Improvements
CORBA Compliance & Automation
•Cache GIOP Request Header part
•Roundtrip throughput improved by ~ 50-100 calls/sec
•Compliant with CORBA specification
•Realized by macros
•Not much gain by doing this
Specialize CORBA Request Header (3/4)
Message Specific Header Caching• Cache both Request Header and Message Specific Header
• Object Key is the same• Service context information (same)• Operation name same e.g., get_data
Server side Only when Thread per connection usedGIOP Formats Only for GIOP 1.2 as 1.0 and 1.1 service contexts are written first
TAO Implementation• Move buffer pointer to start of data segment
• Write out the arguments for the call
• Update the total size of the request (SIZE) and REQUEST_ID fields in the request
Optimization Performance Improvements
CORBA Compliance & Automation
•Cache Request Header + Request Message
•Roundtrip throughput improved by ~ 300 – 350 calls/sec (~ 5 %)
• Latency ~ 3 µsecs (~ 5%)
•Compliant with CORBA specification (service contexts)
•Realizable by using policies at object level at client side
Specialize CORBA Request Header (4/4)
Intent• Instead of caching only the header (Request + Message specific) pre-create
entire CORBA request
Model Invariants• Timer component sends “trigger” (heart beats) to recipient component. Similar situation for timeouts
• Request and data contents are the same
Proposed TAO implementation•Special IDL flag that will pre-create (marshal the request)• Each time same request is sent to the client• Update request ID of the request only• Save cost of request construction and marshaling
Optimization Performance Improvements CORBA Compliance & Automation
• Entire CORBA Request
•Avoids marshaling data completely
•Can eliminate multiple layers by directly sending request
•Not Compliant with spec
• IDL compiler can pre-create and generate entire request
Specialized Request Processing (1/2)
Intent• Resolve the mapping of incoming requests
to the POA, Servant, Skeleton, and operation to which they are dispatched only once, then use these pre computed results to optimize the dispatch of subsequent requests
Model Invariants• get_data operation invokes operation on the
same component, located in the same POA serviced by the same servant and operation
Once Per Connection Resolution of Dispatch
• TAO provides Active Demultiplexing + Perfect Hashing for O(1) lookup time bound
• Caching just POA may not give a lot of performance improvement
Specialized Request Processing (2/2)
TAO Implementation• As the operation names are the same: We directly cache the skeleton and
advance the current buffer pointer to beginning of arguments
• The length is calculated only for the first request and re-used. Cost amortized over number of operations
• Implemented via TAO_CACHE_SERVANT_REF conditional compilation macro
• $TAO_ROOT/performance-tests/Latency/Single-Threaded
Optimization Performance Improvements CORBA Compliance & Automation
•Cache skeleton directly
•Round-trip latency ~ 6µsecs (5%)
• Throughput ~ 300 calls/sec (~ 5%)
•Caching Skeletons not compliant
•Cannot be used in Default Servant and Servant Locator classes
• Provide policies at POA (now that it is refactored) to implement this layer folding
• Implemented as separate IIOPConnection handler class
Specialize Marshaling/De-marshaling
Intent• To mask endianess GIOP Request header contains a flag that indicates endianess of the
request• If different endianess, do byte swapping
Model Invariants• The two machines on which the components are hosted have the same endianess (byte
order) No checks for byte order required ACE Implementation• ACE_CDR streams provide for ACE_SWAP_ON_WRITE and
ACE_DISABLE_SWAP_ON_READ macros that can be used to eliminate checks for byte-ordering
• Macros and not set by default. Model interpreters could generate configuration setting to enable these macros
Optimization Performance Improvements CORBA Compliance & Automation
•Demarshaling check elimination
•Will improve more than ~10 if conditions for a normal CORBA request
• Improvements in both client and server side
•Used in conjunction with header caching optimizations
•Compliant with CORBA specification
•Conditional compilation techniques
Concluding Remarks & Future Work
•Specialization techniques can be used as a technique for “folding layers” based on system invariants
•Current implementation “first cut” uses conditional compilation strategies. Examine more appropriate strategies for implementing these specialization•Request Header Caching – Strategies controlled by svc.conf•Specialize Request Processing – POA request processing policy•Marshaling/de-marshaling – ACE level•Pre-create request – IDL Generated code•Collocation specialization – Macros + Strategies (Invocation classes)
Container
ClientOBJREF
in argsoperation()out args +
return
DIIIDL
STUBSORB
INTERFACE
IDLSKEL
Object Adapter
ORB CORE GIOP/IIOP/ESIOPS
Component(Servant)
ServicesExamine specialization at the
Component Middleware level and Infrastructural Middleware level