A Possible Approach for Big Data Access to Support...
Transcript of A Possible Approach for Big Data Access to Support...
A Possible Approach for Big Data Access to Support Climate Science
Mark Foster
Hugh LaMaster NASA Ames Research Center
ESNet/Internet2 Focused Technical Workshop:
Improving Mobility & Management for International Climate Science July 15, 2014
• This presentation is to facilitate the exchange of ideas related to Big Data access and constraints that can arise: • Trusted Internet Exchange • Security • Bandwidth
• This presentation does not represent any type of Agency policy, project, or endorsement
• Diagrams and notes within this presentation are not planned for implementation, they are for discussion within this workshop
Workshop Presentation Context
• NASA Supercomputing – NAS and NCCS • resources • select transfer characteristics • existing challenges
• TIC – Trusted Internet Connection • goals, motivation (driven by DHS for all federal agencies) • what does this mean for current and near term science data xfers?
• Science DMZ and Data Transfer Nodes • friction free xfers for large datasets • sit at boundary of inside/outside • express for approved traffic, regular path for default • static: use/user designations – known in advance (proactive) • dynamic: traffic types(reactive)
• an opportunity for dynamic flow management w/ SDN • Futures
• clouds with clear skies • internal clusters, external clusters • constrained/specific user community vs unrestricted access
Summary/Overview
• Growing performance of Wide Area Networks (WANs) – 10/40/100 Gbps • WAN host-to-host performance has exceeded FireWall (FW) appliance
performance consistently for last 10 years • TIC mandate specifies required elements of border
• Requires SBU data processing/storage elements to be inside/behind TIC
• Growing sophistication of security threats • Threat environment requires Defense-in-Depth, hardening user hosts
and servers; firewall appliances can’t protect against all threats • OMB mandate to use commercial cloud computing and storage where
possible for low/moderate-security data • Cloud resources are available over WAN; external cloud use for
internal computing increases pressure on LAN/WAN border security elements
• FedRAMP compliant – commercial services brought inside NASA auth boundary still have monitoring/border protection requirements
Computing, Communications Environment Evolving
NASA major supercomputing facilities: NAS and NCCS
• Distributed access: • Earth and Space Science datasets from widely distributed sources
• Results transferred back to widely distributed sites
• Some data at supercomputing facilities for processing; many sets stored elsewhere
• NCCS facility at NASA Goddard Space Flight Center: • major weather/climate/oceanographic modeling and data assimilation
• worldwide climate research
• approx 590 TeraFLOPS computing, 4 PetaBytes of online storage.
• NAS facility at NASA Ames Research Center: • premier NASA supercomputing facility since 1983, focus on simulation for
aerospace (CFD) and science (weather, climate, space science/solar dynamics/astrophysics)
• approx 4 PetaFLOPS computing, 14 PetaBytes of online storage.
Climate Related Data
• Remote Sensing Data
• Assimilated Datasets (validation data)
• Model Output
• Climate Projections
Web portals: access to this data provided by tools and distributed systems that hold the data sets. A useful start.
Growth in types and sizes presents access challenges.
EOSDIS Portal
NASA Earth Exchange Portal
Science/High Performance Computing Requirements in a Nutshell
• Science datasets moving over WANs often 10’s to 100’s of TeraBytes
• Large science flows are typically earth science, astro- and solar physics; these flows are sometimes referred to as “elephant flows”
• Network Round Trip Time (RTT) ranges from 1-2 ms (UC Berkeley, Stanford), 8 ms (JPL), 68 ms (NCSA), 200 ms (University of Oslo)
• Good network performance over large RTT requires end-to-end network and host tuning, zero packet loss, optimizations like Jumbo Frames
• Consumer and commercially oriented desktop/laptop/handheld device networks and security appliances are engineered for a massive number of tiny to small flows (“mouse flows”)
• Consumer/commercial switches/appliances often drop packets/have far too small, ill-behaved buffers to work well on elephant flows
Example Elephant Flows
• Top: all traffic (2 days) via NREN => NAS
• Bottom: same 2-day time, NCSA => NAS
– 700 Mbps average over 48 hours
– 5 minute peaks to ~2.4 Gbps
– Roughly 14 TB dataset in ~32 hours
– Elephant flow was ~70% of total volume during that 2 day interval
– Network has necessary headroom to handle these peaks (of roughly 5 Gbps)
– Application: astrophysics/solar physics
Example Elephant Flows (2)
• NCSA=>NAS – 8 hours at
2.0-4.2 Gbps
– 9000-byte packets
• NAS=>UCSC – About 40
mins at 2.0-2.8 Gbps
– 1500-byte packets
DHS TIC Architecture Requirements
• SBU data processing/storage elements to be inside/behind TIC
• All traffic monitored (e.g. via optical splitter) • Limited WAN border/TIC locations
• Science external connectivity is unusual to DHS
• Most civilian Federal agency connectivity looks similar to business IT
• Ingress and egress data flows of all (TCP/UDP) connections must be routed through the same physical TIC location (Symmetric Routing through TICs).
• TIC links leading to local client-computer LANs have to be configured such that a stateful firewall appliance or “stack” (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in the path
• Packet capture and retention requirements – 24 hour full packet capture at link capacity is requirement – access to previous 24 hrs req’d
• Centralized response management • Ability of centralized agency directive to block an address
(or address range) and have it take effect immediately
DHS TIC Architecture Requirements (continued)
Enterprise Routing (notional)
External Peering Network
Internal Wide Area Network
TIC-1 TIC-n
BP BP
TIC-n Trusted Internet Connection #n
BP Center Border Protection Services (FW, IDS, Content Filter)
symmetric ingress/egress
LAN LAN
external peers external peers
TIC
Boundary
Science Border/WAN Architectural Goals and Designs
• DTN – Science DMZ • Special border DMZ data transfer hosts optimized for WAN performance • Many supercomputer/big data centers implement this now • Requires close cooperation w/ Security to get both performance and security
• On-demand path reservation • ESnet OSCARS provides VLAN-based reservations today within ESnet • Goal: signal end-to-end path from DTN host across LAN, I2, ESnet, transport nets • OSCARS connection via NREN provides path across ESnet for augmented access
for NEX today
• Improved ease-of-data-access among partners • Integrated Globus access with DTN/Science DMZ; integrate PIV/token
authentication • Improved data exportation (Who can read data? Who can change it? Re-
exportation?) • Cloud storage architecture and high-speed access: both external commercial and
FedRAMP compliant that is inside auth perimeter
Reference Science DMZ Architecture
Site: http://fasterdata.es.net/science-dmz/science-dmz-architecture/
FW
IDS
A Possible Science DMZ Architecture within the TIC context WAN
external partners
Science net exchange
fabric
SciDMZ switch/router
FW
IDS
DTN
TIC-n
External Peering Network
Internal Wide Area Network
FW
TIC
Boundary
science project resources
perfSONAR
This diagram does not reflect a NASA plan or architecture. It is for discussion purposes only.
Science DMZ/Data Transfer Node
• Operational problems it solves: • Inability to control features and defaults that supercomputing vendors
support • Inability to control end-users environment, both network and host • Effort required to coordinate all system configurations and
parameters in the supercomputing environment
• Science DMZ border nodes can be configured for optimal WAN transfers • Improved utilization of underlying WAN (E2E Jumbo Frames, big
buffers) • May also integrate easier external user authentication (Globus, PIV) • May also integrate end-to-end reservations; additional security
features
Desired access among partners
• Globus Online/GridFTP users would like to use their Globus credentials
• PIV card users would like to use PIV single-sign-on capability • Users would like to allow easier data sharing between
supercomputers and other facilities that they use • Security issues to be resolved
• Re-exportation of data • Third-party control of sharing of semi-confidential data • Trust among Globus user communities
• Implementation on Science DMZ would allow limited trust of credentials without expanding trust to high-value internal resources
• Establish coordination via Identity, Credential, and Access Management group (ICAM)
On demand path reservation
• Multiple approaches • Software Defined Networking (SDN) with OpenFlow, ESnet OSCARS
(assisted setup of VLAN paths), manually provisioned VLANs, policy-based routing
• OSCARS used to support NEX <-> EDC path
• NASA Ames/CET lab has access to experimental 40/100G capabilities but not yet equipped to provide SDN switching capability at those speeds • Possible test partners include CENIC , Internet2, NSF CC-NIE
recipients, ESnet • Establish how to provision paths without endangering operational
traffic • Integrate with end-user system (probably Science DMZ server)
• Enable Science DMZ users to easily establish more optimal path end-to-end
MyESnet (/) Login (/user/login/) | Register (/user
/register/)
es.net-4003 GPN - NASA, VLAN 3025, 200M 08-01-2013 To 08-01-2014
OSCARS Circuit
Traffic A to Z Delivered
Z to A Delivered
2014-01-24
19:09
(http://www.es.net/)(http://www.lbl.gov)(http://science.energy.gov/)
FAQ (/help/faq)
Site Updates (/help/update)
OSCARS (/oscars) / es.net-4003 (/oscars/es.net-4003)
NASA
30 days 24 hours Last hour Refresh7 days
9/1/5.30
25
to_s
acr-cr5_
ip-a
to_s
unn-cr5_
ip-a
to_d
env-cr5_
ip-a
to_s
acr-cr5_
ip-a
to_k
ans-cr5_
ip-a
to_d
env-cr5_
ip-a
10/1/5.302
5
sunn-cr5 sacr-cr5 denv-cr5 kans-cr5
Existing SDN in the WAN supports NASA Earth Exchange
• Existing static OSCARS VLAN path NAS-NREN-(ESnet VLAN)-EDC – NEX data fetch EDC => HEC – “200 Mbps”, occasionally 650M/1000M – Avoids low performance default route,
long RTT
• SDN goal for WAN – allow project DTN host-host signaling through multiple domains
ESnet OSCARS traffic – EDC => NAS 14 TB/2 days – 650 Mbps avg – RTT 43ms
Possible Futures – Clouds, etc.
• Internal vs External Clusters; clustered Science DMZ DTNs
• Cluster Federation (identity, authorization, access) among participating organizations
• Virtualized network services on VM clouds • SDX – software defined exchange: coordinated
access to clusters and distributed storage capabilities