Post on 20-Nov-2021
BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer
BigData Express Research Team November 10, 2018
Many people’s hard work
FNAL: Qiming Lu, Liang Zhang, Sajith Sasidharan,
Wenji Wu, Phil DeMar
ESnet: Chin Guok, John Macauley, Inder Monga
iCAIR/StarLight: Se-young Yu, Jim-Hao Chen, Joe Mambretti
KISTI: Jin Kim, Seo-Young Noh,
Univ. of Maryland: Xi Yang, Tom Lehman
ORNL: Gary Liu
Acknowledgments
This work was supported by the U.S. DOE Office of Science ASCR network research program
Data Transfer Challenges in Big Data Era
10 Gbps40 Gbps
100 Gbps
1 Tbps
400 Gbps
Throughput
High-Performance Challenge
Time
Real-time data transfer(200-500ms)
Time Constraints on Data Transfer Challenge
Deadline-bounddata transfer
(application specific)
Backgrounddata transfer
(no explicit deadline)
• High-performance challenges
• Time-constraint challenges
Data Transfer – State of the Art
• Advanced data transfer tools and services developed– GridFTP, BBCP
– PhEDEx, LIGO Data Replicator, Globus Online
• Numerous enhancements– Parallelism at all levels• Multi-stream, Multicore, Multi-path parallelism
– Science DMZ architecture
– Terabit networks
Problems with Existing Data Transfer Tools & Services
• Disjoint end-to-end data transfer loop
• Cross-interference between data transfers
• Oblivious to user requirements (e.g., deadlines and Qos requirements)
• Inefficiencies arise with existing data transfer tools running on DTNs
• Distributed resource management model– Resource contention– Performance mismatch
Problem 1 – Disjoint end-to-end data transfer loop
a
High-endDTN transfer 1
LAN
b
c
WANSite 1 Site 2DTNsLAN
Low-endDTN
High-endDTN
a
High-endDTN
transfer 1
LAN
b
c
WANSite 1 Site 2DTNsLAN
a. without coordination
b. with coordination
Performance Mismatch !
Low-endDTN
High-endDTN
a
b
DTNs
s1
s2
br
LAN
flow 1
bandwidth bottleneck
br1
r1
r3
r2
r4
br2
WAN
flow 1
bandwidth bottleneck
a. Network congestion in LANwithout coordination
b. Network congestion in WANwithout coordination
a
b
DTNs
s1
s2
br
LANflow 1
bandwidth bottleneck
c. No network congestion in LANwith coordination
br1
r1
r3
r2
r4
br2
WAN
flow 1
bandwidth bottleneck
d. No network congestion in WANwith coordination
Problem 2 – Cross-interference between data transfers
Severe Cross-interference
Resource Contention
Poor Performance
• Degraded performance• High variability in data transfer performance
Problem 3 – Oblivious to user requirements
Time
Tasks
Task1
Task2
Time
Tasks
Task1
Task2
Deadline1 Deadline2 Deadline1 Deadline2
a. without deadline awareness b. with deadline awareness
• Data transfer jobs are scheduled on a first-come, first serve basis– Without deadline awareness
• Resources are shared fairly among data transfer jobs
Problem 4 – Inefficiencies arises when existing data transfer tools run on DTNs
• I/O locality on NUMA systems• Cache thrashing• Scheduling overheads…. 100GE NIC
NUMA NODE 1 NUMA NODE 2
Local I/Os StorageRemote I/Os
Cores Cores
I/O locality problem on NUMA systems
Need high-performance data transfer tool!
Our Solution - BigData Express
• BigData Express: a schedulable, predictable, and high-performance data transfer service– A peer-to-peer, scalable, and extensible data transfer model– A visually appealing, easy-to-use web portal– A high-performance data transfer engine– A time-constraint-based scheduler– On-demand provisioning of end-to-end network paths with guaranteed QoS– Robust and flexible error handling– CILogon-based security
BigData Express Major Components
• BigData Express Web Portal– Access to BigData Express services
• BigData Express Scheduler– Time-constraint-based scheduler– Co-scheduling DTN, storage, & network
• AmoebaNet– Network as a service– Rate control
• mdtmFTP– High-performance data transfer engine– http://mdtm.fnal.gov
• DTN Agent– Manage and configure DTNs– Collect & report DTN configuration and
status
• Storage Agent– Manage and configure storage systems– I/O estimation
• Data Transfer Launching Agent– Launch data transfer jobs– Support different data transfer
protocols
DTNs
Networks
DTNs
DTNs
DTNs
DTNs
…
Data Transfer Federation 1
Data Transfer Federation 3
…
…
Data Transfer Federation 2
BigData Express -- Flexible
• Flexible to set up data transfer federations
• Providing inherent support for incremental deployment
BDE SchedulerSDN Agent
Storage Agent
DTN AgentDTN Agent
SDN Agent
BDE Web Portal
Data Transfer Launching AgentData Transfer
Launching AgentData Transfer Launching Agent
Message Queue
SDN Agent(AmoebaNet)
DTN Agent
Storage AgentStorage Agent
BDE Scheduler
BigData Express -- Scalable
• BigData Express scheduler manages site resources through agents• Use MQTT as message bus
BDE SchedulerSDN Agent
Storage Agent
DTN AgentDTN Agent
SDN Agent
BDE Web Portal
Data Transfer Launching AgentData Transfer Launching Agent
Data Transfer Launching Agent
mdtmFTPPlugin
GridFTPPlugin
XrootDPlugin
…
Message Queue
SDN Agent(AmoebaNet)
DTN Agent
Storage AgentStorage Agent
BDE Scheduler
BigData Express -- Extensible
• Extensible Plugin framework to support various data transfer protocols• mdtmFTP, GridFTP, XrootD, …
BigData Express -- End-to-End Data Transfer Model
• Application-aware network serviceo On-demand programming
• Fast-provisioning of end-to-end network paths with guaranteed QoS
• Distributed resource negotiation & brokeringLAN WAN LAN
mdtmFTP
AmoebaNet AmoebaNetSENSE
Edge DTNStorage Edge DTN Storage
DTN Agent
Storage Agent
mdtmFTP
DTN Agent
Storage Agent
Web Portal
Scheduler
Data Transfer Launching Agent
Web Portal
Scheduler
Data Transfer Launching Agent
Site A - Smart E2EData Transfer Orchestrator
Site B - Smart E2EData Transfer Orchestrator
A End-to-End Data Transfer Loop with Guaranteed QoS
Resource negotiation & brokering
Resource
negotiation & brokering
Resource
negotiation & brokering
BigData Express – High Performance Data Transfer (I)
mdtmFTP is faster than existing data transfer tools, ranging from 8% to 9500%!@ESnet 100GE SDN Testbed,
Note 1: “-” indicates inability to get transfer to workNote 2: BBCP performance is very poor, we do not list its results hereNote 3: BBCP and FDT support 3rd party data transfer. But BBCP and FDT couldn’t run 3rd party data transfer on ESNET testbed due to testbed limitation
mdtmFTP FDT GridFTP BBCPLarge file data transfer (1 X 100G) 74.18 79.89 91.18 Poor performanceFolder data transfer (30 x 10G) 192.19 217 320.17 Poor performanceFolder data transfer (Linux 3.12.21) 10.51 - 1006.02 Poor performance
Time-to-completion (Seconds) – Client/Server mode
mdtmFTP FDT GridFTP BBCPLarge file data transfer (1 X 100G) 34.976 N/A 106.84 N/AFolder data transfer (30 x 10G) 95.61 N/A - N/AFolder data transfer (Linux 3.12.21) 9.68 N/A - N/A
Time-to-completion (Seconds) – 3rd party mode
Lower is better
Lower is better
BigData Express – High Performance Data Transfer (II)
mdtmFTP is faster than GridFTP, ranging from 40% to 114%!@StarLight 100GE Testbed
StarLight 100GE Testbed
BigData Express -- Three Types of Data Transfer
• Real-time data transfer
• Deadline-bound data transfer
• Best-effort data transfer
BigData Express – Mechanism SummaryProblems with existing data transfer tools BigData Express Solutions
• Disjoint end-to-end data transfer loop
• Cross-interference between data transfers
• Oblivious to user requirements
• Inefficiencies arises when existing data transfer tools run on DTNs
• Distributed resource negotiation & brokering• Co-scheduling of DTN, storage, & networking• On-demand provisioning of end-to-end
network path with guaranteed QoS
• Time-constraint-based scheduler
• Three classes of data transfer
• mdtmFTP – A high-performance data transfer engine
• Admission control• Rate control
• Time-constraint-based scheduler
BigData Express vs. Globus OnlineFeatures BigData Express Globus Online
Architecture • Distributed service• Flexible to set up data transfer federations • Centralized service
Supported Protocols
• Extensible plugin framework to support multiple protocols: o mdtmFTPo GridFTP, XrootD, SRM (coming soon)
• GridFTP
SDN Support• Yes, Network as a service• Fast-provisioning end-to-end network paths with
guaranteed QoS• Not in production
Supported Data Transfers• Real-time data transfer• Deadline-bound data transfer• Best-effort data transfer
• Best-effort data transfer
Error Handling • Checksum• Retransmit
• Checksum• Retransmit
BigData Express SC18 DEMO
4/3
4/6
FNAL Border router
ESNET
40GE
vlan 3619
49
50
BDE4
Pica8 P3930
40GE
FNAL
SENSE Service AmoebaNet
bde-hp2.fnal.gov
yosemite.fnal.gov
BDE Web Protal
BDE Scheduler
47
48
65
66
Testbed Switch
100GE
Border Router
vlan 3619
DTN
UMD
SENSE Service
180-147.research.maxgigapop.net
BDE Web Protal
BDE Scheduler
STP
10.36.19.15
vlan 3619SENSE Path
180-149.research.maxgigapop.net
DTN
180-148.research.maxgigapop.net
BDE3
ProductionSwitch
Production Switch
40GE
HP Z91000
KISTI
AmoebaNet
BDE Web Protal
BDE Scheduler
DTN3DTN2
134.75.125.77
134.75.125.78 134.75.125.79
192.2.2.8 192.2.2.910GE 10GE
StarLight
vlan 1662STP
4/1
4/2
4/3
4/4
4/5
4/6
BDE1 BDE2
Pica8 P5101
40GE 40GE40GE
7374
192.2.2.1 192.2.2.2
77
KREONET
4/1
vlan 1662
STP
STP
StarLight
165.124.33.157
BDE Web Protal
BDE Scheduler
DTN 165.124.33.142
DTN
CENI162.244.229.52Ottawa
DTN
162.244.229.116Hanover
100GE 100GE
UVA
145.100.132.188
BDE Web Protal
BDE Scheduler
DTN 145.100.132.187
vlan
203
8
40GE
KSTAR
BDE Web Protal
BDE Scheduler
DTN3DTN2
203.230.120.130
203.230.120.127
100GE 100GE
10.36.19.11
203.230.120.128
203.230.120.227 203.230.120.228
10.250.38.107
10.250.38.53
BigData Express -- Deployment• Asia– KISTI, South Korea
• https://sc-demo-01.sdfarm.kr:2888/– KSTAR, South Korea
• Https://203.230.120.130:8080
• Europe– University of Amsterdam, Netherlands
• https://bde-01.lab.uvalight.net/
• North America– Fermilab
• https://Yosemite.fnal.gov:5000– StarLight, Northwestern University
• https://starlight.bigdataexpress.website/– UMD/MAX, University of Maryland, College Park
• https://180-147.research.maxgigapop.net/
Next Stage R&D Plan – Functional Perspective
DTN Agent
SDN Agent
Storage Agent mdtmFTP
PluginGridFTP
PluginXrootD Plugin
Data Transfer Launching Agent
DataBase
Configuration managementWebPortal Account
ManagementBigData Express
REST APIs
Error handlingScheduler
Rucio, Adios-based scientific applications, other scientific workflows
…
More information about BigData Express
http://bigdataexpress.fnal.gov
Contact: wenji@fnal.gov
This document was prepared by BigData Express using the resources of the Fermi National Accelerator Laboratory (Fermilab), a U.S. Department of Energy, Office of Science, HEP User Facility. Fermilab is managed by Fermi Research Alliance, LLC (FRA), acting under Contract No. DE-AC02-07CH11359.