Post on 03-Dec-2018
DDSS: A Low-Overhead Distributed Data Sharing
Substrate for Cluster-Based Data-Centers over Modern
Interconnects
K. Vaidyanathan, S. Narravula and D. K. Panda
Network Based Computing Laboratory (NBCL)
The Ohio State University
Presentation Outline
• Introduction and Motivation
• Proposed DDSS Framework
• Experimental Results
• Conclusions and Future Work
Introduction and Motivation
WANWAN
Clients
Web-server(Apache)
DatabaseServer
(MySQL)
Storage
• Internet growth– Number of Users, Type of Service, Amount of data– E-Commerce, online-banking, stocks, airline reservations
• Data-centers enable such services– Process data and reply to queries– Need for services like caching, resource adaptation for performance,
scalability
ProxyServer
Caching,load
balancing
Application Server (PHP)
CGI, PHP
Multi-TierData-Centers
High-Performance Networks
• InfiniBand, 10 GigE– High Bandwidth– Low Latency
• Provides rich features– RDMA semantics, Atomic operations, Protocol offload
• OpenFabrics stack– Single interface for InfiniBand, iWARP/10 GigE, etc
• Targeted for Multi-Tier Data-Centers• Can the data-center processes coordinate
better?
Information-Sharing is common• Applications typically employ their own
– Data placement and management protocols– Synchronization protocols
• Data-Center services– Active Resource Adaptation
• Maintain Server state information
• Locking requirements– Caching
• Coherency & Consistency requirements
– Resource Monitoring (IBM Websphere)
• Load information shared across several servers
– Critical decisions based on shared information
ProxyModule M1(S1, S2
S3, S4 load)
ProxyModule M2
(S1, S2S3, S4 load)
ProxyModule M3
(S1, S2S3, S4 load)
Load ofServer S1
Load ofServer S2
Load ofServer S3
Load ofServer S4
Resource Monitoring Service
Problems with Existing approaches
• Ad-hoc messaging protocols for exchanging data• May have high overheads• Performance may depend on the system load• May not use the advanced features• May not be scalable
Objective
• Can we design a load resilient substrate (DDSS) for data-center applications and services utilizing advanced features such as RDMA, remote atomic operations?
Presentation Outline
• Introduction and Motivation
• Proposed DDSS Framework
• Experimental Results
• Conclusions and Future Work
Distributed Data Sharing Mechanism
Shared Data
Data-CenterApplication
ResourceAdaptationServices
LoadBalancingServices
Data-CenterApplication
ResourceMonitoringServices
ResourceMonitoringServices
Get
Get load
Get load
Put
Put load
Put load
Lock Data
Provide an effective mechanism to share data across the data-center
Proposed DDSS Framework
InfiniBand 10 GigE High-Speed Interconnects
ProtocolOffload
RDMA Atomic Multicast
�P� Ma�a�e�e��
�o��e���o�M���
Me�oryM���
a�aM���
!a"��Lo�#"
�o$ere��y,�o�"�"�e��yMa���e�a��e
Data-CenterApplications
Data-CenterServices
High-SpeedNetworks
AdvancedNetworkFeatures
DistributedData-Sharing
SubstrateComponents
Proposed Framework Contd…
• Data Management– Local vs Remote, for load
balancing
• Basic Locking– Through atomic operations
(IBA)
• Coherency and Consistency Maintenance– Strict, Write/Read, Null, Delta,
Version– Use of RDMA and atomic
operations
�P� Ma�a�e�e��
�o��e���o�M���
Me�oryM���
a�aM���
!a"��Lo�#"
�o$ere��y,�o�"�"�e��yMa���e�a��e
ProtocolOffload
RDMAAtomic
Proposed Framework Contd…
• Connection Management– Takes care of connection-setup and
teardown for nodes participating in DDSS
• Memory Management– Allocates a pool of memory for
DDSS on each node– Manages allocation, release
operations
• IPC Management– Access for multiple threads– Message Queues
a�a%�e��er&''l��a��o�"
(Serv��e"
Module
IPC
OpenFabricsStack
OtherApplications
OtherModules
TCP/IPStack
DDSS Interface
DDSS Interface• allocate_ss(…)• release_ss(…)• get(…)• put(…)• acquire_lock_ss(…)• release_lock_ss(…)• …
Key = allocate_ss(1024, NONCOHERENT_SS, 5000);
put(key, data, 10);compute();get(key,data, 10);release_ss(key);
Key = allocate_ss(1024, WRITE_COHERENT_SS, 5000);
acquire_lock_ss(key);
put(key, data, 10);release_lock_ss(key);
compute();get(key,data, 10);release_ss(key);
Presentation Outline
• Introduction and Motivation
• Proposed Framework
• Experimental Results
• Conclusions and Future Work
Experimental Testbed
• InfiniBand– Cluster with dual Intel Xeon 3.4 GHz, 1GB memory– MT25128 Mellanox HCA
• iWARP/GigE– Cluster with Intel dual Xeon 3.0 GHz, 512 MB
memory– Ammasso 1100 Gigabit Ethernet NIC
• OpenFabrics stack– IB, Ammasso (iWARP)
Experimental Results Outline
• Microbenchmarks– Performance of put() and get() operations
• Distributed Applications– Distributed STORM– Checkpointing Application
• Data-Center Services– Active Resource Adaptation
– Active Caching
Microbenchmarks
• performance of put() and get() operation for small messages is less than 65 usecs for all coherence models
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
1 4 0
1 1 6 2 5 6 4 0 9 6 6 5 5 3 6
M e s s a g e S i z e ( b y t e s )
Lat
ency
(u
secs
)
N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
1 4 0
1 1 6 2 5 6 4 0 9 6 6 5 5 3 6
M e s s a g e S i z e (b y t e s )
Lat
ency
(u
secs
)
N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a
put() performance get() performance
Distributed STORM
• Select data of interest and transfer from storage to compute nodes
• Same dataset is processed by multiple STORM applications
� this shared dataset is placed in DDSS
• STORM using DDSS shows close to 19% improvement
01 0 0 02 0 0 03 0 0 04 0 0 05 0 0 06 0 0 07 0 0 08 0 0 0
Q u e r y E x e c u t i o n
T i m e (u s e c s )
1 K 5 K 1 0 K 1 0 0 K
# R e c o r d s
S T O R M S T O R M -D D S S
CR Coordination
• Checkpoint at random time
• Simulates restart from a consistent checkpoint
• Checkpoint uses DDSS for maintaining checkpoint information, locks, versions, etc
• Check-pointing applications using DDSS are highly scalable
050
100150200250300350
2 3 4 5 6 7 8 9 10 11 12
Number of cl ients
Tim
e (u
secs
)Avg Sync Time Avg Total Time
Active Resource Adaptation
• Monitors the load of different websites
• If a website is loaded, shift under-utilized servers to loaded websites
• Software Overhead of DDSS is < 2%
0
5 0
1 0 0
1 5 0
5 1 0 2 0 4 0 6 0 8 0
L o a d (%)
Tim
e (u
secs
)
05 0 01 0 0 01 5 0 02 0 0 02 5 0 03 0 0 0
R e c o n f i g u r a t i o n T i m e s o f t w a r e -o v e r h e a dN o o f R e c o n f i g u r a t i o n s
Active Caching
• Supports Strong Coherency for cached dynamic data
• Checks the back-end for current version using RDMA
• Active cache using DDSS is load-resilient
01 0 02 0 03 0 04 0 05 0 06 0 07 0 0
1 2 4 8 1 6 3 2
N u m b e r o f C o m p u t e /C o m m u n i c a t i o n T h r e a d s
Tim
e (u
secs
)
V e r s i o n C h e c k - D D S S V e r s i o n C h e c k - T C P
Conclusions & Future Work
• Proposed a distributed data sharing substrate• Using DDSS, data-center applications and
services, with very little modification, can get significant benefits in performance and scalability
• Implemented over OpenFabrics – applicable across InfiniBand, iWARP-capable adapters
• Future work on Fault-tolerance, support for large file sizes, advanced resource management schemes.
Acknowledgements
Our research is supported by the following organizations
• Current Funding support by
• Current Equipment support by
Web Pointers
Group Homepage: http://nowlab.cse.ohio-state.edu
Emails: {vaidyana, narravul, panda}@cse.ohio-state.edu
NBC-LAB
Backup Slides
High-Performance Networks in Data-Centers
• InfiniBand, iWARP-capable adapters– Offer several features like RDMA, atomic operations (IB), iWARP
(Ammasso, 10 GigE)
Cluster-Based Data-Center Environment
(InfiniBand, iWARP-capable Ammasso, 10 GigE)
WideArea
Network
Distributed Data-Center Environment
iWARPCluster
iWARPCluster
iWARPCluster
Active Resource Adaptation Design
ServerWebsite A
LoadBalancer
ServerWebsite B
Not Loaded Loaded
Load QueryLoad Query
Successful Atomic (Lock)
Successful Atomic (Update Counter)
Reconfigure Node
Successful Atomic (Unlock)
Load Shared Load Shared
RDMARDMA
P. Balaji, K. Vaidyanathan, S. Narravula and D.K. Panda “Exploiting Remote Memory Operations to Design Efficient Reconfiguration forShared Data-Centers over InfiniBand” presented at RAIT 2004
RDMA based Client Polling Design
Front-End Back-End
Request
Cache Hit
Cache Miss
Response
Version Read
Response
S. Narravla, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, .. /u and D.K. Panda “Supporting Strong Coherency for Active Caches in Multi-Tier Data-
Centers over InfiniBand” presented at SAN 2004
Microbenchmarks
• performance of put() and get() operation is less than 50 usecs
0
1 0
2 0
3 0
4 0
5 0
6 0
1 5 9
N u m b e r o f C l i e n t s
Lat
ency
(u
secs
)
N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a
0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
0 .1 0 .8
L o c k C o n t e n t i o n (% )
Lat
ency
(u
secs
)
N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a