Computer networks
• Funded projects (GRA openings)• NSF SDCI: 2 years left• DOE HNTES: 4 years left (new grant awarded)• NSF CC-NIE (new): 3 years• NSF SCRP: 2 years left• NSF JUNO: 3 years (just starting)
• Applied orientation
1
Malathi Veeraraghavan Univ. of Virginia
[email protected] 2013 (updated Jan. 2014)
Outline
• Big picture• Four projects
– What is the problem?– Why solve it? (Motivation)
• Methods used– As a GRA, what would I do?
• Processes & style
2
Big picture
• Networks to support scientific research community– High-speed– Low-latency
• Who is in the science community?– DOE Office of Science
• Basic energy sciences, high-energy physics, fusion energy sciences, bio & environ. research
– NSF Office of Cyber Infrastructure (OCI)3
Both agencies (NSF OCI and DOE) support
• Supercomputing centers– nersc.gov– olcf.gov– alcf.gov– XSEDE (NSF OCI)
• High-speed networks– Backbone: ESnet, Internet2– Campus and regional nets: DYNES
4
NSF Software Dev. for Cyber Infrastructure (SDCI)
• Problem & motivation (what & why): 1. Climate scientists run simulations that
require > 5000 cores• Intra-datacenter network identified as
bottleneck (InfiniBand cluster: 72K cores)• MPI communications: need to reduce latency
and variance in latency
2. Scientists move tera-to-peta byte sized files: move these fast• 100 Gbps: current state of the art in link
speed but not throughput (software!)
5
DOE Hybrid Network Traffic Engineering System (HNTES)
• Problem & motivation:– Find high-rate, large-sized (alpha)
flows within a network and isolate– Why?
• As link rates increase, spread between fastest flow and slowest flow increases
• Fast flows can delay slow flows (user sees poor quality for real-time flows)
• On links to providers: Service Level Agreements (SLAs) can be violated when fast flows appear
6
NSF Campus Cyberinfrastructure – Network Infrastructure & Engineering (CC-
NIE)
• Problem & motivation– Design protocols/apps to multicast data
reliably to hundreds of receivers– Save network & computing resources
when compared to unicast delivery from one sender to hundreds of receivers
• Application: Weather data distribution– UCAR sends real-time weather data
almost continuously to 170 institutions
7
NSF Scheduled Circuit Routing Protocol (SCRP)
• Problem & motivation– Scientific networking community has
been building out a new type of internetwork with circuits and virtual circuits (airlines)• why: service guarantees (think fedex)
– Contrast with Internet (roadways)– Routing problem: what should one
organization’s network tell another to enable path computation for circuits?
8
NeTS: JUNO: Collaborative Research: ACTION: Applications Coordinating
with Transport, IP, and Optical Networks
• This project is a joint collaboration with U. Texas at Dallas, and two universities in Japan
• The UVA portion of the project will develop application and transport protocols for optical networks
• Starting Feb. 1, 2014
9
Outline
• Big picture• Four projects
– What is the problem?– Why solve it? (Motivation)
Methods used– As a GRA, what would I do?
• Processes & style
10
Methods used: Stats
• Science before engineering:– Theodore von Karman:
• “Scientists study the world as it is; engineers create the world that never has been”
– Data collection & statistics• Rely on contacts at DOE labs, universities, network
operators for operational data• Write R programs to analyze procured data• Use fir research cluster for parallel computing
• Skills needed: stats/R language/parallel prog.
11
Methods used: run experiments
• Run existing software used by scientists to obtain measurements
• Use national supercomputers and network testbeds– NCAR Wyoming SC: MPI programs (climate) – U. Utah Emulab– ESnet 100G network testbed– U. New Mexico: PROBE– ExoGENI racks: OpenFlow switches– DYNES: 10 high-performance hosts/switches across US
• Skills needed: learn/run new software programs; write shell scripts; cron jobs; use rigorous scientific methods in executing expts.
12
Methods used: simulations
• For NSF SCRP project– Problem requires large-scale thinking– Cannot implement– Cannot collect data as system does
not yet exist– Then simulate
• Skills needed: C++ programming, parallel programming, prob & stats, rigorous scientific methods
13
Methods used: engineering
• Come up with engineering solutions for problems identified from scientific discovery through analysis of operational data and experimentally collected data
• Implement software• Evaluate solutions on testbeds• Two key points
– Exploratory not confirmatory (watch out for bias)– Always quantify the negative!
14
Methods: Write papers
• Conference first, then journal• Collab Web site for grad students
– how to organize a paper– hierarchical– think of reviewers– know your community’s work– literature search (when?)
15
Outline
• Big picture• Four projects
– What is the problem?– Why solve it? (Motivation)
• Methods used– As a GRA, what would I do?
Processes & style
16
Processes
• Goals as a graduate student– Focus on next step
• quals• proposal defense• dissertation
– Want Masters en route: MCS or MS– Career goal: academics or industry– Community, community, community– Ask the process question for each step
17
Advising style
• Close collaboration with GRA– Research grants have milestones/deliverables– Generate ideas/papers/software that others use
– who is the customer? what is the product?
• New ideas from GRA– Develop proposals: Security for DHS; Vehicular
• Communicate – be open• Full-time access (no substitute for hard
work) – two-way commitment
18
Summary
• High-speed, low-latency networking for– Scientific applications: scientists– Network utilization: providers, campus,
datacenter– Bottom-up: new optical comm. technologies
• Techniques used– Obtain operational data/experimental
measurements and analyze statistics – find the real problem
– Develop engineering solution– Evaluate through experiments or simulations
19
Top Related