ESnet New Architecture Customer Empowered Fibre Networks (CEF) Prague, May 17, 2005
-
Upload
keane-bullock -
Category
Documents
-
view
24 -
download
2
description
Transcript of ESnet New Architecture Customer Empowered Fibre Networks (CEF) Prague, May 17, 2005
1
ESnet New Architecture
Customer Empowered Fibre Networks (CEF) Prague, May 17, 2005
William E. Johnston ESnet Manager and Senior Scientist
Lawrence Berkeley National [email protected]
2
ESnet Serves DOE Office of Science Sites• Office of Science (OSC) has 10 National Labs (blue)
• 7 other DOE Labs also have major OSC programs
3
ESnet
• ESnet’s mission to support the large-scale science of the U.S. DOE Office of Science results in a unique networko ESnet currently transports about 400-450 Terabytes/montho Top 100 data flows each month account for about
25-40% of the total monthly network traffico These top 100 flows represent massive data flows from
science experiments to analysis sites and back
• At the same time ESnet supports all of the other DOE collaborative science and the Lab operationso The other 60-75% of the ESnet monthly traffic is in
6,000,000,000 flows
ESne
t Sc
ienc
e D
ata
Net
wor
k (S
DN
) co
re
TWC
SNLL
YUCCA MT
BECHTEL-NV
PNNLLIGO
INEEL
LANL
SNLAAlliedSignal
PANTEX
ARM
KCP
NOAA
OSTI ORAU
SRS
JLAB
PPPLINEEL-DCORAU-DC
LLNL/LANL-DC
MIT
ANL
BNLFNAL
AMES
4xLAB-DC
NREL
LLNL
GA
DOE-ALB
GTN&NNSA
International (high speed)10 Gb/s SDN core10G/s IP core2.5 Gb/s IP coreMAN rings (≥ 10 G/s)OC12 ATM (622 Mb/s)OC12 / GigEthernetOC3 (155 Mb/s)45 Mb/s and less
Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)
QWESTATM
42 end user sites
ESnet IP core
SINet (Japan)Japan – Russia (BINP)CA*net4 France
GLORIAD Kreonet2MREN NetherlandsStarTap TANet2Taiwan (ASCC)
AustraliaCA*net4Taiwan (TANet2)Singaren
ESnet IP core: Packet over SONET Optical Ring and
Hubs
ELP HUB
ATL HUB
DC HUB
peering points
MAE-E
PAIX-PAEquinix, etc.
PNW
GPo
P
SEA HUB
ESnet Provides a High-Speed, full Internet Services Networkfor DOE Facilities and Collaborators (Summer 2005 status)
IP core hubs SNV HUB
Abilene high-speed peering points
Abilene
Abile
ne
CERN(LHCnet – partDOE funded)
GEANT - Germany, France, Italy, UK, etc
NYC HUB
Starlight
Chi NAP
CHI-SL HUB
SNV HUB
Abilene
SNV SDN HUB
JGILBNL
SLACNERSC
SND core hubs
SDSC HUB
Equinix
MAN
LAN
Abile
ne
MAXGPoP
SoXGPoP
SNV SDN HUB
ALB HUB
ORNL
CHI HUB
~2600 miles (4200 km)
5
STAR
LIGH
T
MAE-E
NY-NAP
GA
LBN
L
ESnet Logical Connectivity: Peering and Routing InfrastructureESnet peering points (connections to other networks)
NYC HUBS
SEA HUB
SNV HUB
MAE-W
FIX-
W
PAIX-W 16 PEERS
CA*net4 CERNFrance GLORIADKreonet2 MRENNetherlands StarTapTaiwan (ASCC) TANet2
MAX GPOP
GEANT - Germany - France - Italy - UK - etc
AustraliaCA*net4Taiwan
(TANet2)Singaren
13 PEERS2 PEERS
LANL
TECHnet
2 PEERS
36 PEERS
CENICSDSC
PNW-GPOP
CalREN2 CHI NAP
Distributed 6TAP18 Peers
2 PEERS
EQX-ASH
1 PEER
1 PEER
10 PEERS
ESnet supports science collaboration by providing full Internet access• manages the full complement of Global Internet routes (about 160,000
IPv4 routes from 180 peers) at 40 general peering points in order to provide DOE scientists access to all Internet sites
• high-speed peerings w/ Abilene and the international R&E networksThis is a lot of work and is very visible.
ATL HUB
University
International
Commercial
Abilene
EQX-SJ
Abilene
28 PEERS
Abilene
6 PE
ERS
14 PEERSNGIX
2 PEERS
Direct (core-core)
Abilene
6 Universities
SINet (Japan)KEKJapan – Russia (BINP)
6
Observed Drivers for the Evolution of ESnet
ESnet Monthly Accepted TrafficFeb., 1990 – Feb. 2005
ESnet is Currently Transporting About 430 Terabytes/mo.(=430,000 Gigabytes/mo. = 430,000,000 Megabytes/mo.)
and this volume is increasing exponentially
TByt
es/
Mon
th
7
Observed Drivers for the Evolution of ESnet
Oct., 1993
Aug., 1990
Jul., 1998
39 months
57 months
42 months
ESnet traffic has increased by 10X every 46 months, on average,since 1990
Dec., 2001
TByt
es/M
onth
8
Source and Destination of the Top 30 Flows, Feb. 2005Te
raby
tes/
Mon
th
Ferm
ilab
(US
) W
estG
rid (C
A)
SLA
C (U
S)
INFN
CN
AF
(IT)
SLA
C (U
S)
RA
L (U
K)
Ferm
ilab
(US
) M
IT (U
S)
SLA
C (U
S)
IN2P
3 (F
R)
IN2P
3 (F
R)
Fer
mila
b (U
S)
SLA
C (U
S)
Kar
lsru
he (D
E)
Ferm
ilab
(US
)
Joh
ns H
opki
ns
12
10
8
6
4
2
0
LIG
O (U
S)
Cal
tech
(US
)
LLN
L (U
S)
NC
AR
(US
)
Ferm
ilab
(US
) S
DS
C (U
S)
Ferm
ilab
(US
) K
arls
ruhe
(DE
)
LBN
L (U
S)
U. W
isc.
(US
)Fe
rmila
b (U
S)
U
. Tex
as, A
ustin
(US
)B
NL
(US
) L
LNL
(US
)B
NL
(US
) L
LNL
(US
)Fe
rmila
b (U
S)
U
C D
avis
(US
)Q
wes
t (U
S)
E
Sne
t (U
S)
Ferm
ilab
(US
) U
. Tor
onto
(CA
)B
NL
(US
) L
LNL
(US
)B
NL
(US
) L
LNL
(US
)C
ER
N (C
H)
BN
L (U
S)
NE
RS
C (U
S)
LB
NL
(US
)D
OE
/GTN
(US
) J
Lab
(US
)U
. Tor
onto
(CA
) F
erm
ilab
(US
)N
ER
SC
(US
) L
BN
L (U
S)
NE
RS
C (U
S)
LB
NL
(US
)N
ER
SC
(US
) L
BN
L (U
S)
NE
RS
C (U
S)
LB
NL
(US
)C
ER
N (C
H)
Fer
mila
b (U
S)
DOE Lab-International R&E
Lab-U.S. R&E (domestic)
Lab-Lab (domestic)
Lab-Comm. (domestic)
9
Science Requirements for Networking
The network and middleware requirements to support DOE science were developed by the OSC science community representing major DOE science disciplines:
o Climate simulationo Spallation Neutron Source facilityo Macromolecular Crystallographyo High Energy Physics experimentso Magnetic Fusion Energy Sciences
o Chemical Scienceso Bioinformaticso The major supercomputing
facilities and Nuclear Physics were considered separately
Available at www.es.net/#research
Conclusions: the network is essential foro long term (final stage) data analysis and collaborationo “control loop” data analysis (influence an experiment in progress)o distributed, multidisciplinary simulation
August, 2002 Workshop Organized by Office of ScienceMary Anne Scott, Chair, Dave Bader,Steve Eckstrand. Marvin Frazier, Dale Koelling, Vicky White
Workshop Panel Chairs: Ray Bair, Deb Agarwal, Bill Johnston, Mike Wilde, Rick Stevens, Ian Foster, Dennis Gannon, Linda Winkler, Brian Tierney, Sandy Merola, and Charlie Catlett
10
Evolving Quantitative Science Requirements for NetworksScience Areas considered in the Workshop(not Nuclear Physics and Supercomputing)
Today End2End
Throughput
5 years End2End
Documented Throughput
Requirements
5-10 Years End2End Estimated
Throughput Requirements
Remarks
High Energy Physics
0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput
Climate (Data & Computation)
0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput
SNS NanoScience Not yet started 1 Gb/s 1000 Gb/s + QoS for control channel
remote control and time critical throughput
Fusion Energy 0.066 Gb/s(500 MB/s burst)
0.198 Gb/s(500MB/20 sec. burst)
N x 1000 Gb/s time critical throughput
Astrophysics 0.013 Gb/s(1 TBy/week)
N*N multicast 1000 Gb/s computational steering and collaborations
Genomics Data & Computation
0.091 Gb/s(1 TBy/day)
100s of users 1000 Gb/s + QoS for control channel
high throughput and steering
Tier 1
Tier2 Center
Online System
eventreconstruction
French Regional Center
German Regional Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
~0.6-2.5 Gbps
100 - 1000
Mbits/sec
Physics data cache
~PByte/sec
Tier2 CenterTier2 CenterTier2 Center
~0.6-2.5 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
• 2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player.
• Grid infrastructure spread over the US and Europe coordinates the data analysis
CERN LHC CMS detector15m X 15m X 22m, 12,500 tons, $700M.
analysis
Italian Center Fermilab, USA Regional Center
Courtesy Harvey
Newman, CalTech
CERN / LHC High Energy Physics Data Provides One ofScience’s Most Challenging Data Management Problems
(CMS is one of several experiments at LHC)
2.5-40 Gbits/sec
event simulation
human
The DOE Participation in the LHC is the ImmediateSource of Requirements for Changes to ESnet
• Both LHC tier 1 data centers in the U.S. are at DOE Office of Science Labs – Fermilab (Chicago) andBrookhaven Lab (Long Island (New York))
• Data from the two major LHC experiments – CMS and Atlas – will be stored at these centers for analysis by groups at US universities
• As LHC (CERN high energy physics accelerator) data starts to move, the large science flows in ESnet will increase a lot (200-2000 times)
The DOE Participation in the LHC is the ImmediateSource of Requirements for Changes to ESnet
• CERN and DOE will bring 10G circuits from CERN to Chicago/Starlight and MANLAN (New York)for moving LHC data to these centerso Each path will go 20+Gb/s by 2008
• Full bandwidth backup must be provided
• Similar aggregate bandwidth will be required out of the Tier 1 centers to the 15 U. S. Tier 2 (analysis) sites (universities)
• Progress in the network configuration is driven by progressively more realistic experiments – “service challenges” (formerly “mock data challenges”)
14
SC2 met its throughput targets
Kors Bos, NIKHEF, Amsterdam
• Service Challenge 2 o Throughput test from Tier-0 to Tier-1 siteso Started 14th March
• Set up Infrastructure to 7 Siteso BNL (Upton, NY), CCIN2P3 (Lyon), CNAF (Bologna),
FNAL (Chicago), GridKa (Karlsruhe), RAL (Didcot, UK), SARA (Amsterdam)
• ~100MB/s to each siteo At least 500MB/s combined out of CERN at same timeo 500MB/s to a few sites individually
• Two weeks sustained 500 MB/s out of CERN
15
SC2 Tier0/1 Network Topology
UKLight
RAL
2x1G
2x1G
CNAF
IN2P3
GridKa10G
1G
1Gshared
BNL
ESNet
StarLight
FNAL
10G
10G
1Gshared
CERN
Tier-0
GEANT
NetherLight
SARA
10G
10G
2x1G
3x1G
3x1G
Kors Bos, NIKHEF, Amsterdam
16
SC2 met its throughput targets
• >600MB/s daily average for 10 days was achieved - Midday 23rd March to Midday 2nd April
o Not without outages, but system showed it could recover rate again from outages
o Load reasonable evenly divided over sites (given network bandwidth constraints of Tier-1 sites)
Kors Bos, NIKHEF, Amsterdam
17
• Optical Private Network, consisting of dedicated 10G paths between T0 and each T1, two flavors:o “Light path T1”o “Routed T1”
• Special measures for back-up for T0-T1, to be filled-in later
• T0 preferred interface is 10Gbps Ethernet LAN-PHY
LHC high-level network architectureErik-Jan Bos
Director of Network Services SURFnet, The Netherlands
T0/T1 network meeting
NIKHEF/SARA, Amsterdam, The Netherlands; April 8, 2005
18
A proposed high-level architecture (2)
19
A proposed high-level architecture (3)
20
ESnet Approach for LHC Requirements
Starlight
LHC
CERN
FNAL
FNAL
FNAL
SL
TRIUMF CANARIE
32 AoA
CERN
60 Hudson(or 32 AoA)
Sunnyvale
BNL
SDN
ESnet
IP
QwestChicago
ESnet
IP
IP
SDN
SDN
GEANT
MAN LAN
All paths are 10 Gb/s
21
Brookhaven Lab, Upton, NY, USA - Atlas UTA (University of Texas at Arlington Arlington, TX, USAUniversity of Oklahoma Norman, OK, USAUniv. of New Mexico Albuquerque, NM, USALangston University Langston, OK, USAUniv. of Chicago Chicago, IL, USAIndiana Univ. Bloomington, IN, USABoston Univ. Boston, MA, USAHarvard Univ. Cambridge, MA, USA Fermilab, Batavia, Ill, USA - CMS MIT Cambridge, MA, USAUniv. of Florida Gainesville, FL, USAUniv. of Nebraska Lincoln, NE, USAUniv. of Wisconsin Madison, WI, USACaltech Pasadena, CA, USAPurdue Univ. Purdue, IN, USAUniv. of California, San Diego San Diego, CA, USA
U.S., LHC Tier 2 Sites (1-5 Gb/s each)
22
ESnet Evolution• With the old architecture (to 2004) ESnet can not
meet the new requirements
• The current core ring cannot handle the anticipated large science data flows at affordable cost
• The current point-to-point tail circuits to sites are neither reliable nor scalable to the required bandwidth
ESnetCore
New York (AOA)Chicago (CHI)
Sunnyvale (SNV)Atlanta (ATL)
Washington, DC (DC)
El Paso (ELP)
DOE sites
23
ESnet’s Evolution – The Requirements
• In order to accommodate the growth, and the change in the types of traffic, the architecture of the network must change to support the general requirements of
1) High-speed, scalable, and reliable production IP networking- University and international collaborator and general science
connectivity- Highly reliable site connectivity to support Lab operations- Global Internet connectivity
2) Support for the high bandwidth data flows of large-scale science
- Very high-speed network connectivity to specific sites- Scalable, reliable, and very high bandwidth site connectivityAlso, provisioned circuits with guaranteed quality of service
(e.g. dedicated bandwidth) and for traffic isolation
24
ESnet’s Evolution – The Requirements
• The general requirements, then, areo Fully redundant connectivity for every siteo High-speed access to the core for every site
- at least 20 Gb/s, generally, and 40-100 Gb/s for some sites
o 100 Gbps national core/backbone bandwidth by 2008 in two independent backbones
25
ESnet Strategy For A New ArchitectureThree part strategy
1) Metropolitan Area Network (MAN) rings to provide- dual site connectivity for reliability- much higher site-to-core bandwidth- support for both production IP and circuit-based traffic
2) A Science Data Network (SDN) core for- provisioned, guaranteed bandwidth circuits to support large, high-speed
science data flows- very high total bandwidth- multiply connecting MAN rings for protection against hub failure- alternate path for production IP traffic
3) A High-reliability IP core (e.g. the current ESnet core) to address- general science requirements- Lab operational requirements- Backup for the SDN core- vehicle for science services
GEANT (Europe)
Asia-Pacific
New York
Chica
go
Sunnyvale
Washington, DC
El Paso (ELP)
Primary DOE Labs
IP core hubs
ESnet Target Architecture: IP Core + Science Data Network + MANs
Possible new hubs
Atlanta (ATL)
CERN
Seattle
Albuquerque (ALB)
SDN/NLR hubs
Aus.
Aus.
Production IP coreScience Data Network coreMetropolitan Area NetworksLab suppliedInternational connections
ESnetScience Data Network
(2nd Core - NLR)
ESnetIP Core
MetropolitanArea Rings
San DiegoLA
27
ESnet New Architecture - Tactics• How does ESnet get to the 100 Gbps backbone and the
20-40 Gbps redundant site connectivity that is needed by the OSC community in the 3-5 yr time frame?
• Only a hybrid approach is affordableo The core IP network that carries the general science and Lab
enterprise traffic should be provided by a commercial telecom carrier in the wide area in order to get the >99.9% reliability that certain types of science use and the Lab CIOs demand
o Part, or even most, of the wide area bandwidth for the high impact science networking will be provided by National Lambda Rail (NLR) – an R&E network that is much less expensive than commercial telecoms (98% reliable)
o The Metropolitan Area Networks that get the Labs to the ESnet cores are a mixed bag and somewhat opportunistic – a combination of R&E networks, dark fiber networks, and commercial managed lambda circuits will be used
28
ESnet New Architecture – Risk Mitigation
• NLR today is about 98% reliable* which is not sufficient for some applications, however risk mitigation is provided by the new architecture
• For bulk data transfer the requirement is typically that enough data reach the analysis systems to keep them operating at full speed because if they fall behind they cannot catch upo The risk mitigation strategy is
- enough buffering at the analysis site to tolerate short outages
- over provisioning the network so that data transfer bandwidth can be increased after a network failure in order to refill the buffers*Estimate based on observation by Steve Corbato, Internet2/Abilene
29
ESnet New Architecture – Risk Mitigation
• For experiments requiring “real time” guarantees – e.g. Magnetic Fusion experiment data analysis during an experiment (as described in ref. 1) – the requirement is typically for high reliabilityo The risk mitigation strategy is a “hot” backup path via the
production IP network- The backup path would be configured so that it did not consume
bandwidth unless it was brought into use by failover from the primary path
- The general strategy would work for any application whose network connectivity does not require a significant fraction of the production IP network as backup
– this is true for all of real time application examined in the workshop
– it might not be true for a large-scale, Grid based workflow system
30
ESnet Strategy: MANs
• The MAN (Metropolitan Area Networks) architecture is designed to provideo At least 2x10 Gb/s access to every site
- 10 Gb/s production IP traffic and backup for large science data- 10 Gb/s for circuit based transport services for large-scale science
o At least one redundant path from sites to ESnet coreo Scalable bandwidth options from sites to ESnet coreo The first step in point-to-point provisioned circuits
• Tacticso Build MAN rings from managed lambda serviceso The 10 Gb/s Ethernet ring for virtual circuits and the
10 Gb/s IP ring are not commercially available services
31
Site gateway routersite equip. Site gateway router
ESnet production
IP core
ANLFNAL
ESnet MAN Architecture (e.g. Chicago)R&E peerings
monitor
ESnet management and
monitoring
ESnet managedλ / circuit servicestunneled through the IP backbone
monitor
site equip.
ESnet production IP service
ESnet managedλ / circuit services
T320
International peerings
Site LAN Site LAN
ESnet SDN core
T320
2-4 x 10 Gbps channels
core router
switches managingmultiple lambdas
core router
Starlight Qwest
32
~46 miles (74 km)
San Francisco Bay Area – the First ESnet MAN
33
The First ESnet MAN: SF Bay Area (Sept., 2005)Chicago
(Qwest hub)
El Paso
Seattle and Chicago (NLR)
LA andSan Diego
SF Bay Area
λ4 future
λ3 future
λ2 SDN/circuits
λ1 production IP
SLACQwest /
ESnet hub
SNLL
Joint Genome InstituteLBNL
NERSC
LLNL
•2 λs (2 X 10 Gb/s channels) in a ring configuration, and delivered as 10 GigEther circuits
•Dual site connection (independent “east” and “west” connections) to each site
•Will be used as a 10 Gb/s production IP ring and2 X 10 Gb/s paths (for circuit services) to each site
•Qwest contract signed for two lambdas 2/2005 with options on two more
•One link every month - completion date is 9/2005
Qwest-ESnetnational core
ringNational Lambda
Rail circuits
ESnet MAN ring(~46 miles (74km) dia.)
NASAAmes
Level 3hub
ESnet hubs and sites
34
35
ESnet New Architecture - Tactics• Science Data Network (SDN)
o Most of the bandwidth is needed along the West and East coasts and across the northern part of the country
o Use multiple National Lambda Rail* (NLR) lambdas to provide 30-50 Gbps by 2008
o Close the SDN ring in the south to provide resilience at 10 Gbps
o Funding has been requested for this upgrade
* NLR is a consortium of US R&E institutions that operate a national, optical fiber network
36
DENDEN
ELPELP
ALBALBATLATL
Metropolitan Area Rings
ESnet Goal – 2007/2008
Aus.
CERN Europe
SDGSDG
AsiaPacSEASEA
Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene
New ESnet hubsESnet hubs
SNVSNV
Europe
Japan
CHICHI
Science Data Network coreLab suppliedMajor international
Production IP ESnet core
DCDC
Japan
NYCNYC
Aus.
MetropolitanAreaRings
• 10 Gbps enterprise IP traffic • 40-60 Gbps circuit based transport
ESnetScience Data Network
(2nd Core – 30-50 Gbps,National Lambda Rail)
ESnet IP Core(≥10 Gbps)
10Gb/s10Gb/s30Gb/s
40Gb/s
CERN
Euro
pe
Denver
Seattle
Sunn
yval
e
LA
San Diego
Chicago
PittsWash DC
Raleigh
Jacksonville
Atlanta
KC
Baton Rouge
El Paso - Las Cruces
Phoenix
Pensacola
Dallas
San Ant. Houston
Albuq. Tulsa
New YorkClev
Proposed ESnet Lambda InfrastructureBased on National Lambda Rail – FY08
NLR wavegear sites
NLR regeneration / OADM sites
Boise
38
New Network Services• New network services are also critical for ESnet to
meet the needs of large-scale science
• Most important new network service is dynamically provisioned virtual circuits that provideo Traffic isolation
- will enable the use of non-standard transport mechanisms that cannot co-exist with TCP based transport
o Guaranteed bandwidth- the only way that we have currently to address deadline
scheduling – e.g. where fixed amounts of data have to reach sites on a fixed schedule in order that the processing does not fall behind far enough so that it could never catch up – very important for experiment data analysis
• Control plane is being jointly developed with Internet2/HOPI
39
OSCARS: Guaranteed Bandwidth Service
usersystem2
usersystem1
site B
resourcemanager
resourcemanager
polic
er
auth
oriz
atio
n
shap
er
site A
allocationmanager
polic
er
bandwidthbroker
resourcemanager
resourcemanager
40
41
42
43
44
References – DOE Network Related Planning Workshops
1) High Performance Network Planning Workshop, August 2002http://www.doecollaboratory.org/meetings/hpnpw
2) DOE Science Networking Roadmap Meeting, June 2003http://www.es.net/hypertext/welcome/pr/Roadmap/index.html
3) DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003
http://www.csm.ornl.gov/ghpn/wk2003
4) Science Case for Large Scale Simulation, June 2003http://www.pnl.gov/scales/
5) Workshop on the Road Map for the Revitalization of High End Computing, June 2003
http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report)
6) ASCR Strategic Planning Workshop, July 2003http://www.fp-mcs.anl.gov/ascr-july03spw
7) Planning Workshops-Office of Science Data-Management Strategy, March & May 2004
o http://www-conf.slac.stanford.edu/dmw2004