OptIPuter System Software

79
System Software OptIPuter System Software Andrew A. Chien Computer Science and Engineering, UCSD January 2005 OptIPuter All-Hands Meeting

description

OptIPuter System Software. Andrew A. Chien Computer Science and Engineering, UCSD January 2005 OptIPuter All-Hands Meeting. OptIPuter Software Architecture for Distributed Virtual Computers v1.1. DVC/ Middleware. High-Speed Transport. Optical Signaling/Mgmt. - PowerPoint PPT Presentation

Transcript of OptIPuter System Software

Page 1: OptIPuter System Software

System Software

OptIPuter System Software

Andrew A. ChienComputer Science and Engineering, UCSD

January 2005

OptIPuter All-Hands Meeting

Page 2: OptIPuter System Software

System Software

OptIPuter Software Architecture for Distributed Virtual Computers v1.1

• January 2003, OptIPuter All Hands Meeting

Layer 4: XCPNode Operating Systems

-configuration, Net Management

Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE)

Physical Resources

DVC #1

OptIPuter Applications

DVC #2 DVC #3

Layer 5: SABUL, RBUDP, Fast, GTP

Real-Time Objects

Security Models

Data Services:DWTP

Higher Level Grid Services

VisualizationDVC/

Middleware

High-Speed Transport

Optical Signaling/Mgmt

Page 3: OptIPuter System Software

System Software

OptIPuter Software Architecture

Distributed Applications/ Web Services

Telescience

GTP XCP UDT

LambdaStreamCEP RBUDP

Vol-a-Tile

SAGE JuxtaView

Visualization

DVC ConfigurationDVC API

DVC Runtime Library

Data Services

LambdaRAM

Globus

XIOPIN/PDC

DVC Services

DVC Core Services

DVC Job Scheduling

DVCCommunication

Resource Identify/Acquire

NamespaceManagement

Security Management

High SpeedCommunication

Storage Services

GRAM GSI RobuStore

Page 4: OptIPuter System Software

System Software

System Software/Middleware Progress

• Significant Progress in Key Areas!• A unified Vision of Application Interface to the OptIPuter Middleware

– Distributed Virtual Computer: Simpler Application Models, New Capabilities– 3-Layer Demonstration: JuxtaView/LambdaRAM Tiled Viz on DVC on Transports

• Efficient Transport Protocols to exploit High Speed Optical Networks– RBUDP/LambdaStream, XCP, GTP, CEP, SABUL/UDT– Single Streams, Converging Streams, Composite Endpoint Flows– Unified Presentation under XIO (single application API)

• Performance Modeling– Characterization of Vol-a-tile Performance on Small-scale Configurations

• Real-time– Definition of a Real-time DVC, Components for Layered RT Resource Management –

IRDRM, RCIM

• Storage– Design and Initial Simulation Evaluation of LT Code-based Techniques for Distributed

Robust (low variance of access, guaranteed bandwidth) Storage

• Security– Efficient Group Membership Protocols to support Broadcast and Coordination across

OptIPuters

Page 5: OptIPuter System Software

System Software

Cross Team Integration and Demonstrations

• TeraBIT Juggling, 2-layer Demo [SC2004, November 8-12, 2004]– Distributed Virtual Computer, OptIPuter Transport Protocols (GTP)

– Move data between OptIPuter Network Endpoints (UCSD, UIC, Pittsburgh)

– Share efficiently; Good Flow Behavior, Maximize Transfer Speeds (saturate all rcvrs)

– Configuration: 10 endpoints, 40+ nodes, 1000’s of miles– Achieved 17.8Gbps, a TeraBIT in less than one minute!

• 3-layer Demo [AHM2005, January 26-7, 2005]– Visualization, Distributed Virtual Computer, OptIPuter Transport

Protocols

• 5-layer Demo [iGrid, September 26-8, 2005 ??]– Biomedical/Geophysical, Visualization, Distributed Virtual Computer,

OptIPuter Transport Infrastructure, Optical Network Configuration

Page 6: OptIPuter System Software

System Software

OptIPuter Software “Stack”

Optical Network Configuration

Novel Transport Protocols

Distributed Virtual Computer (Coordinated Network and Resource Configuration)

Visualization

Applications (Neuroscience, Geophysics)

3-layerDemo

5-layerDemo

Page 7: OptIPuter System Software

System Software

Year 3 Goals

• Integration and Demonstration of Capability– All Five Layers (Application, Visualization, DVC, Transport Protocols, Optical Network Control)– Across a Range of Testbeds– With Neuroscience and Geophysical Applications

• Distributed Virtual Computer– Integrate with Network Configuration (e.g. PIN)– Deploy as persistent OptIPuter Testbed Service– Alpha Release of DVC as a Library

• Efficient Transport Protocols– LambdaStream: Implement, Analyze Effectiveness, Integrate with XIO– GTP: Release and Demonstrate at Scale; Analytic Stability Modeling– CEP: Implement and Evaluate Dynamic N-to-M Communication– SABUL/UDT: Integrate with XIO; Flexible Prototyping Toolkit– Unified Presentation under XIO (single application API)

• Performance Modeling– Characterization of Vol-a-tile, JuxtaView Performance on Wide-Area OptIPuter

• Real-time– Prototype RT DVC, Experiment: remote device control within Campus Scale OptIPuter

• Storage– Prototype RobuSTore, Evaluate using OptIPuter Testbeds and Applications

• Security– Develop and Evaluate High Speed / Low Latency Network Layer Authentication and Encryption

Page 8: OptIPuter System Software

System Software

10Gig WANs: Terabit Juggling

Netherlands

United States

PNWGPSeattle

StarLightChicago

CENIC Los Angeles

CENICSan Diego

10 GE

UI at Chicago

10 GE

10 GE

10 GE

10 GE

10 GE 10 GE

NIKHEF

2 GE

2 GEUCI

ISI/USC

NetherLightAmsterdam

UCSD/SDSC

SC2004Pittsburgh

U of Amsterdam

CSE

SIO

SDSC JSOE

10 GE 10 GE 10 GE

2 GE

1 GE

Trans-Atlantic Link

SC2004: 17.8Gbps, a TeraBIT in < 1 minute!SC2005: Juggle Terabytes in a Minute

Page 9: OptIPuter System Software

System Software

3-layer Integrated Demonstration

1. Visualization Application (Juxtaview + LambdaRAM)

2. System SW Fmwork (Distributed Virtual Computer)

3. System SW Transports (GTP, UDT, etc.)

Nut Taesombut, Venkat Vishwanath, Ryan Wu, Freek Dijkstra, David Lee, Aaron Chin, Lance Long

UCSD/CSAG, UIC, UvA, UCSD/NCMIR, etc.

January 2005, OptIPuter All Hands Meeting

Page 10: OptIPuter System Software

System Software

3-Layer Demo Configuration

SDSC/San Diego

NCMIR/San Diego

EVL/Chicago

UvA/Amsterdam

CAMPUS GE10G/ 0.5 msec

NLR/CAVEWAVE10G/ 70 msec

Transatlantic Link4G/ 100 msec

Audiences

OutputVideo

Streaming

GTP Flows

• Configuration– JuxtaView at NCMIR

– LamdaRAM Client at NCMIR

– LambdaRAM Server EVL, UvA

• High Bandwidth (2.5Gbps, ~7 streams)• Long Latencies, Two Configurations

Page 11: OptIPuter System Software

System Software

Distributed Virtual Computers

Nut Taesombut and Andrew Chien

University of California, San Diego

January 2005

OptIPuter All-Hands Meeting

Page 12: OptIPuter System Software

System Software

Distributed Virtual Computer (DVC)

• Application Request: Grid Resources AND Network Connectivity– Redline-style Specification, 1st Order Constraint Language

• DVC Broker Establishes DVC– Binds Ends Resources, Switching, Lambda’s– Leverages Grid Protocols for Security, Resource Access

• DVC <-> Private Resource Environment, Surface thru WSRF

DVC

Page 13: OptIPuter System Software

System Software

Distributed Virtual Computer (DVC)

• Key Features – Single Distributed Resource Configuration Description and Binding

– Simple use of Optical Network Configuration and Grid Resource Binding

– Single Interface to Diverse Communication Capabilities

– Transport Protocols, Novel Communication Capabilities

• Using a DVC– Application presents Resource Specification

– Requests Grid Resources and Lambda Connectivity

– DVC Broker Selects Resources and Network Configuration

– DVC Broker Binds Resources and Configures Network, and Return List of Bound Resources and Their Respective (Newly Created) IP’s

– Application Uses These IP’s to Access Created Network Paths

– Application Selects Communication Protocols and Mechanisms amongst Bound Resources

– Application Executes

– Application Releases the DVC

[Taesombut & Chien, UCSD]

Page 14: OptIPuter System Software

System Software

JuxtaView and LambdaRAM on DVC Example

(1) Requests a Viz Cluster, Storage Servers, and High-Bandwidth Connectivity

DVC Manager

Resource/Network Information Services

(Globus MDS)

ApplicationRequirementsand Preference(communication+ end resources)

[ viz ISA [type =="vizcluster"; InSet(special-device, "tiled display")]; str1 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str2 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str3 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str4 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; Link1 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str1>; bandwidth > 940; latency <= 100]; Link2 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str2>; bandwidth > 940; latency <= 100]; Link3 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str3>; bandwidth > 940; latency <= 100]; Link4 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str4>; bandwidth > 940; latency <= 100] ]

Physical Resources andNetwork Configuration

viz1: ncmir.ucsd.sandiegostr1: rembrandt0.uva.amsterdamstr2: rembrandt1.uva.amsterdamstr3: rembrandt2.uva.amsterdamstr4: rembrandt6.uva.amsterdam

(rembrandt0,yorda0.uic.chicago) --- BW 1, LambdaID 3(rembrandt1,yorda0.uic.chicago) --- BW 1, LambdaID 4(rembrandt2,yorda0.uic.chicago) --- BW 1, LambdaID 5(rembrandt6,yorda0.uic.chicago) --- BW 1, LambdaID 17

Page 15: OptIPuter System Software

System Software

JuxtaView and LambdaRAM on DVC Example

(2) Allocates End Resources and Communication • Resource Binding (GRAM)

• Lambda Path Instantiation (PIN) (Current Demo doesn’t yet include this)

• DVC IP Allocation

DVC Manager PIN Server

192.168.85.13

192.168.85.14

192.168.85.15

192.168.85.16

192.168.85.12

UvA/AmsterdamNCMIR/San Diego

Page 16: OptIPuter System Software

System Software

JuxtaView and LambdaRAM on DVC Example

(3) Create Resource Groups • Storage Group

• Viz Group

DVC Manager

192.168.85.13

192.168.85.14

192.168.85.15

192.168.85.16

192.168.85.12

UvA/AmsterdamNCMIR/San Diego

Viz Group

Storage Group

Page 17: OptIPuter System Software

System Software

JuxtaView and LambdaRAM on DVC Example

(4) Launch Applications • Launch LambdaRAM Servers

• Launch JuxtaView/ LambdaRAM Clients

DVC Manager

192.168.85.13

192.168.85.14

192.168.85.15

192.168.85.16

192.168.85.12

UvA/AmsterdamNCMIR/San Diego

Viz Group

Storage Group

Page 18: OptIPuter System Software

System Software

OptIPuter Component Technologies

1. Real-time DVC’s2. Application Performance Analysis 3. High Speed Transports (CEP, LambdaStream, XCP, GTP,

UDT)4. Storage5. Security

Page 19: OptIPuter System Software

System Software

Vision – Real-Time Tightly Coupled Wide-Area Distributed Computing

Real-Time

Object network

Goals

• High-precision Timings of Critical Actions

• Tight Bounds on Response Times

• Ease of Programming

–High-Level Prog–Top-Down Design

• Ease of Timing Analysis

Dynamically formed

DistributedVirtual

Computer

Source: Kim, UCI

Page 20: OptIPuter System Software

System Software

Real-Time DVC Architecture

Real-time ApplicationReal-time Application

TMO Real-Time MiddlewareTMO Real-Time Middleware

Distributed Virtual MachineDistributed Virtual Machine

High Speed Protocols/Network ManagementHigh Speed Protocols/Network Management/Basic Resource Management/Basic Resource Management

Application expressed as teal time objects and links w/ various latency constraints)

Schedules and manages underlying resources to achieve desired RT

Collection of Resources with known performance and security capabilities,

and control & management Provides simple resource and management abstractions, hides detailed resource management (i.e. network provisioning, machine reservation)

Real-Time Object Network

Libraries that realize initial configuration and ongoing management

Controls and Manages “single” resources

Page 21: OptIPuter System Software

System Software

Real-Time: from LAN to WAN

• RT grid (or subgrid) ::= A grid (or subgrid) facilitating

(RG1) Message communications with easily determinable tight latency bounds and

(RG2) Computing node operations enabling easy guaranteeing of timely progress of threads toward computational milestones

• RG1 realized via – Dedicated optical-path WAN – Campus networks, the LAN part of the RT grid,

equipped with Time-Triggered (TT) Ethernet switches (a new research task in collaboration with Hermann Kopetz)

Source: Kim, UCI

Page 22: OptIPuter System Software

System Software

Real-Time DVC

(RD1) Message paths with easily determinable tight latency bounds.

(RD2) In each computing or sensing-actuating site within the RT DVC, computing nodes must exhibit timing behaviors which are not different from those of computing nodes in an isolated site by more than a few percents.

Also, computing nodes in an RT DVC must enable easy procedures for assuring the very high probability of application processes and threads reaching important milestones on time.

=> Computing nodes must be equipped with appropriate infrastructure software, i.e., OS kernel & middleware with easily analyzable QoS.

(RD3) If representative computing nodes of two RT DVCs are connected via RT message paths, then the ensemble consisting of the two DVCs and the RT message paths is also an RT DVC.

Source: Kim, UCI

Page 23: OptIPuter System Software

System Software

Middleware for Real-Time DVC

Acq of ’s; Alloc of Virtual ’s; Coord of msg-send timings

Source: Kim, UCI

data

data

data

" Let us start a chorus at 2pm " " e-Science "

Basic Infrastructure Services

Globus System l-Configuration Net Management

RCIM

RT comm infrastr mgt

IRDRM

Intra-RT-DVC res mgt

RGRMRT grid resource management

RCIM agentRCIM agent IRDRM agentIRDRM agent

On-demand creation of DVCsSupport exec of appls viaAlloc of comp & comm resources within DVC

Page 24: OptIPuter System Software

System Software

Progress

• RCIM (RT comm infrastructure mgt) – Study of TT Ethernet began with the help of Hermann Kopetz– The 1st unit is expected to become available to us by June 2005.

• IRDRM (Intra-RT-DVC resource mgt)– TMO (Time-triggered Message-

triggered Object) Support Middleware (TMOSM) adopted as a starting base

– A significantly redesigned version (4.1) of TMOSM (for improved modularity, concurrency, and portability) has been developed.

It runs on Linux, WinXP, and WinCE. – An effort for extending the TMOSM

to fit into the Jenks’ cluster began.

var

TT Method 2

Service Method 1

TT Method 1AAC

AAC

Compo-nents of a C++ object

• No thread, No priorityHigh-level Programming Style

Deadlines

Service Method 2

Source: Kim, UCI

Page 25: OptIPuter System Software

System Software

Progress (cont.)

• Programming model– An API wrapping the services of the RT middleware enables

high-level RT programming (TMO) without a new compiler.– The notion of Distance-Aware (DA) TMO, an attractive building-

block for RT wide-area DC applications, was created and a study for its realization began.

• Application development experiments– Fair and efficient Distributed On-Line Game Systems and LAN-

based feasibility demonstration– Application of the global-time-based coordination principle– A step towards OptIPuter environment demonstration

• Publication– A paper on distributed on-line game systems in IDPT2003 proc.– A paper on distributed on-line game systems to appear in ACM-

Springer Journal on Multimedia Systems– A keynote paper on RT DVC at AINA2004 proc. – A paper on RT DVC middleware to appear in WORDS2005 proc.

Source: Kim, UCI

Page 26: OptIPuter System Software

System Software

Year 3 Plan

• RCIM (RT comm infrastructure mgt) – Development of middleware support for TT Ethernet – The 1st unit of TT Ethernet switch is expected to become

available to us by June 2005.

• IRDRM (Intra-RT-DVC resource mgt)– Extension of TMOSM to fit into clusters– Interfacing TMOSM to the Basic Infrastructure Services of

OptIPuter

Source: Kim, UCI

Page 27: OptIPuter System Software

System Software

Year 3 Plan

• Application development experiments– An experiment for remote access and control within the UCI or

UCSD campus– A step toward preparation of an experiment for remote access

and control of electron microscopes at UCSD-NCMIR

Source: Kim, UCI

Page 28: OptIPuter System Software

Xin

gfu

Wu

<w

uxf

@cs

.tam

u.e

du

>h

ttp

://p

rop

hes

y.cs

.tam

u.e

du

Performance Analysis and Monitoring of VolaTile

Performance Analysis and Monitoring of VolaTile

Use Prophesy system to Instrument and Study VolaTile Use Prophesy system to Instrument and Study VolaTile on 5-node Systemon 5-node System

Evaluate Performance Impact of Configuration (data Evaluate Performance Impact of Configuration (data servers, clients, network)servers, clients, network)

Data access time on 1+4 nodes

02468

101214161820

Scenario 1 Scenario 2 Scenario 3

Tim

e (s

ecs)

protein64x64x64

fuel64x64x64

foot256x256x256

geo256x256x256

geo440x290x198

furdave160x255x75

[Wu & Taylor, TAMU]Wu & Taylor, TAMU]

Page 29: OptIPuter System Software

Xin

gfu

Wu

<w

uxf

@cs

.tam

u.e

du

>h

ttp

://p

rop

hes

y.cs

.tam

u.e

du Comparison of VolaTile Configuration

Scenarios Comparison of VolaTile Configuration

Scenarios

Data access time on 1+4 nodes

02468

101214161820

Scenario 1 Scenario 2 Scenario 3

Tim

e (s

ecs)

protein64x64x64

fuel64x64x64

foot256x256x256

geo256x256x256

geo440x290x198

furdave160x255x75

Page 30: OptIPuter System Software

Xin

gfu

Wu

<w

uxf

@cs

.tam

u.e

du

>h

ttp

://p

rop

hes

y.cs

.tam

u.e

du

Year 3+ PlansYear 3+ Plans

• Port the instrumented Volatile to a large-Port the instrumented Volatile to a large-scale optiputer testbed for analysis scale optiputer testbed for analysis (3/2005)(3/2005)

Analyze the performance of JuxtaView Analyze the performance of JuxtaView and LambdaRam applications (6/2005)and LambdaRam applications (6/2005)

Where possible, develop models of data Where possible, develop models of data accesses for the different visualization accesses for the different visualization applications (9/2005)applications (9/2005)

Continue collaborating with Jason’s Continue collaborating with Jason’s group about viz applications (12/2005)group about viz applications (12/2005)

Page 31: OptIPuter System Software

System Software

High Speed Protocols

Page 32: OptIPuter System Software

System Software

High Performance Transport Problem

• OptIPuter is Bridging the Gap Between High Speed Link Technologies and Growing Demands of Advanced Applications

• Transport Protocols Are the Weak Link– TCP Has Well-Documented Problems That Militate Against its Achieving High

Speeds

– Slow Start Probing Algorithm

– Congestion Avoidance Algorithm

– Flow Control Algorithm

– Operating System Considerations

– Friendliness and Fairness Among Multiple Connections

– These Problems Are the Foci of Much Ongoing Work

– OptIPuter is Pursuing Four Complementary Avenues of Investigation

– RBUDP Addresses Problems of Bulk Data Transfer

– SABUL Addresses Problems of High Speed Reliable Communication

– GTP Addresses Problems of Multiparty Communication

– XCP Addresses Problems of General Purpose, Reliable Communication

Page 33: OptIPuter System Software

System Software

OptIPuter Transport Protocols

Allocated Lambda Shared, Routed

E2e Path

RBUDP/

-stream

GTP SABUL/

UDT

XCP

Unicast ManagedGroup

EnhancedRouters

StandardRouters

• Composite Endpoint Protocol (Efficient N-to-M Communication)

Page 34: OptIPuter System Software

System Software

Composite Endpoint Protocol (CEP)

Eric Weigle and Andrew A. Chien

Computer Science and Engineering

University of California, San Diego

OptIPuter All Hands Meeting, January 2005

Page 35: OptIPuter System Software

System Software

Composite-EndPoint Protocol (CEP)

• Network Transfers Faster than Individual Machines– A Terabit flow? A 100Gbit flow? A 10Gbps flow w/ 1Gbps NIC’s– Clusters are Cost-effective means to terminate Fast transfers– Support Flexible, Robust, General N-to-M Communication– Manage Heterogeneity, Multiple Transfers, Data Accessibility

Uh-oh!

[Weigle & Chien, UCSD]

Page 36: OptIPuter System Software

System Software

Example

• Move Data from a Heterogeneous Storage Cluster (N)• Exploit Heterogeneous network structure and Dedicated Lambda’s• Terminate in a Visualization Cluster (M)• Render for a Tiled Display Wall (M)

– Data flow is not easy for the application to handle.

– May want to locally to the storage cluster to offload checksum/buffering requirements or avoid a contested link.

Page 37: OptIPuter System Software

System Software

Composite Endpoint Approach

• Transfers Move Distributed Data– Provides hybrid memory/file

namespace for any transfer request

• Choose Dynamic Subset of Nodes to Transfer Data– Performance Management for

Heterogeneity, Dynamic Properties Integrated with Fairness

• API and Scheduling– API enables easy use

– Scheduler handles performance, fairness, adaptation

• Exploit Many Transport Protocols

Page 38: OptIPuter System Software

System Software

CEP Efficiently Composes Heterogenous and Homogeneous Cluster Nodes

0

1000

2000

3000

4000

5000

6000

7000

1 2 3 4 5 6 7 8

Heterogeneous Nodes

Flo

w B

W (

Mb

ps)

Uniform

CEP

Ideal

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Uniform Nodes

Flow

BW

(Mbp

s)

Ideal

CEP

• Seamless Composition of Performance, widely varying node performance• High Composition efficiency, demonstrated 32Gbps from 1Gbps nodes!

– Efficiency increasing as implementation improves– Scaling suggests 1000 node Composites => Terabit Flows

• Next Steps: Wide Area, Dynamic Network Performance

Page 39: OptIPuter System Software

System Software

Summary and Year 3 Plans

• Current Scheduling Mechanism is Static– Selects nodes to move data– Handles static heterogeneity

– node/link capabilities– 32Gbps in LAN

• Simple API Specification– Ease of use; scheduler takes care of transfer– Allows Scatter/Gather with arbitrary constraints on data

• Plans: 1H2005– XIO implementation: Use GTP, TCP, other transports– Tuned WAN Performance– Dynamic Transfer Scheduling (adapt to network and node conditions)

• Plans: 2H2005– Security, code stabilization, optimization– Initial Public Release– 5-layer Demo Participation– Better Dynamic Scheduling– De-centralization– Fault Tolerance

Page 40: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

LambdaStream

Chaoyue Xiong, Eric He, Venkatram Vishwanath,

Jason Leigh, Luc Renambot, Tadao Murata, Thomas A. DeFanti

January 2005OptIPuter All Hands Meeting

Page 41: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

LambdaStream (Xiong)

Applications Need High BW with low jitterIdea• Combine loss-based and rate-based techniques • Loss type prediction, respond appropriately• => Good BW and Low Jitter

Throughput of TCP and LS on the 1Gbps Link

0

200

400

600

800

1000

1200

1400

1600

1800

0 0.5 1 1.5 2 2.5 3 3.5 4

Time (s)

Th

rou

gh

pu

t (M

bp

s)

172Mbps

1720Mbps

983Mbps

TCP

Jitter of TCP and LS Flow with 2MB Payload

0

20

40

60

80

100

120

0 100 200 300 400 500

Round

Tim

e (m

s)

TCP

LS

Page 42: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Loss Type Prediction

When packet loss occurs,Average receiving interval

=

Loss Types:

•Continuous decrease in receiving capability

•Occurrence of congestion in the link

•Sudden decrease in receiving capability or random loss

Page 43: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Incipient undesirable situations avoidance (1)

• When there is no loss, longer receiving packet interval indicates link congestion or lower receiving capability.

∆ts wi

∆tr

Sender Bottleneck router Receiver

wi+1

Page 44: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Incipient undesirable situations avoidance (2)

• Metric: – Ratio between the sending interval and

the average receiving interval during one epoch.

• Methods to improve precision– Use weighted addition of receiving

intervals from the previous three epochs.

– Exclude unusual samples.

Page 45: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Single Stream Experiment Result (1)

Throughput of TCP and LS on the 1Gbps Link

0

200

400

600

800

1000

1200

1400

1600

1800

0 0.5 1 1.5 2 2.5 3 3.5 4

Time (s)

Th

rou

gh

pu

t (M

bp

s)

172Mbps

1720Mbps

983Mbps

TCP

Page 46: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Single Stream Experiment Result (2)

Jitter of TCP and LS Flow with 2MB Payload

0

20

40

60

80

100

120

0 100 200 300 400 500

Round

Tim

e (m

s)

TCP

LS

Page 47: OptIPuter System Software

Electronic Visualization Laboratory University of Illinois at Chicago

Year 3 Plans

• Development of XIO driver• Experiments with multiple streams• Integrate with TeraVision and SAGE.• Use formal modeling (Petri Net) to improve

the scalability of the algorithm.

Page 48: OptIPuter System Software

Information Sciences Institute

Joe BannisterAaron Falk

Jim PepinJoe Touch

OptIPuter ProjectProgress

January 18, 2005

Page 49: OptIPuter System Software

OptIPuter XCP Progress

Design of Linux XCP portNet100 tweaksMakes most sense for end-systems only; little benefit by

changing OS for XCP routersStrategy is to put XCP in generic Linux 2.6 kernel; then

port to Net100 (Net100 optimizations are largely orthogonal to XCP)

Technical challenges exist in extending Linux kernel to handle 64-bit arithmetic needed for XCP

Linux port is pending conclusion of on-going design work to eliminate line-rate divide operations from router

[Bannister, Falk, Pepin, Touch ISI]

Page 50: OptIPuter System Software

OptIPuter XCP Activities

Workshops Aaron Falk, Ted Faber, Eric Coe, Aman Kapoor, and Bob Braden. Experimental

Measurements of the eXplicit Control Protocol. Second Annual Workshop on Protocols for Fast Long Distance Networks. February 16, 2004. http://www.isi.edu/isi-xcp/docs/falk-pfld04-slides-2-16-04.pdf

Aaron Falk. NASA Optical Network Testbeds Workshop. August 9-11, 2004, NASA Ames Research Center. User Application Requirements, Including End-to-end Issues. http://duster.nren.nasa.gov/workshop7/report.html

Papers Aaron Falk and Dina Katabi. Specification for the Explicit Control Protocol

(XCP), draft-falk-xcp-00.txt (work in progress), October 2004. http://www.isi.edu/isi-xcp/docs/draft-falk-xcp-spec-00.txt

Aman Kapoor, Aaron Falk, Ted Faber, and Yuri Pryadkin. Achieving Faster Access to Satellite Link Bandwidth. Submitted to Global Internet 2005). December 2004. http://www.isi.edu/isi-xcp/docs/kapoor-pep-gi2005.pdf

Page 51: OptIPuter System Software

OptIPuter Network Infrastructure

Deployed GBE link between CENIC I2 cloud and ISIOperational for NSF site visitUsed extensively by viz and Globus groups

Page 52: OptIPuter System Software

System Software

Group Transport Protocol (GTP)

Ryan Wu and Andrew A. Chien

Computer Science and Engineering

University of California, San Diego

OptIPuter All Hands Meeting, January 2005

Page 53: OptIPuter System Software

System Software

Optical Network Cores Shift Contention to Network Edge

• Lambda-Grid: Dedicated Optical Connections Provide Plentiful Core Bandwidth

• Driving Applications Access Many High Data Rate Sources – Multipoint-to-point communication

• => Congestion moves to the endpoints • Group Transport Protocol: Rate-based + Receiver Based Management

`

S1

S2

S3

R

(a) Shared IP Network (b) Dedicated lambda connections

`

S1

S2

S3

R

[Wu & Chien, UCSD]

Page 54: OptIPuter System Software

System Software

GTP: Receiver-based Congestion Management

• Request-response for Reliable Data Transfer• Receiver-based Flow Co-scheduling for Fairness and Low Loss Rate

– Balance Concurrent Data Fetching from Multiple Sources

– Fair across Varied Sender RTTs

– Efficient Transitions under Rapid Changes

• Single Flow Adaptation and Capacity Estimation

R1 R2

Multipoint-to-point contention at receivers

…...Single Flow Control and Monitoring

Centralized Rate Allocation

UDP (data flow) / TCP (control flow)

IP

Applications

GTP

GTP Receiver Architecture

Page 55: OptIPuter System Software

System Software

Quick Single Flow Rate Adaptation

Single GTP flow (flow 1) is able to quickly probe the available bandwidth.

GTP flow 1 starts at t=0, with capacity 1000Mbps; flow 2 starts at time t=2s, and its maximum transmission rate is 300Mbps.

Page 56: OptIPuter System Software

System Software

Group Transport Protocol (GTP)

• Multipoint Performance in NS2 Simulations– Four GTP flows with RTT 20, 40, 60 and 80ms starting at time 0, 2, 3, and 4s.

• GTP uses Receiver-based Management to achieve Rapid Convergence and Fair Allocation

R

S4

S3

20ms

80ms

Converging Flows:

S2S1

40ms

60ms

[Wu & Chien, UCSD]

Page 57: OptIPuter System Software

System Software

Quick Adaptation to Flow Transition

R

S1

S2 25ms

50ms

Converging Flows:

• GTP Simulation, Emulation, TCP Simulation

• Second Flow begins at t=10 seconds

• GTP Utilizes Network Efficiently through Flow Transitions

Page 58: OptIPuter System Software

System Software

• SDSC -- NCSA, 10GB transfer (1Gbps link capacity), 58ms RTT• Convergent Flows• GTP outperforms the other Rate-based Protocols due to Receiver-oriented

managment Converging flows:

RS1

S2

S3

NCSA SDSC

0

200

400

600

800

1000

Throughput (Mbps) 443 811 865

Loss Ratio (%) 53.3 8.7 0.06

RBUDP UDT GTP

Benefits of Receiver-Based Control

Page 59: OptIPuter System Software

System Software

Year 3 Plan

1H2005• GTP Implementation and Testing

– Release a reliable version of GTP with XIO driver

• Comprehensive comparison studies between GTP and other transport protocols

• Demonstrations with OptIPuter System Software

2H2005• Formal stability proofs for GTP will be Developed

– Proof of stability and convergence properties of GTP

– Networking conference publication

• Extend GTP to Sender Capacity Managment– Sender side contention managed to achieve good global performance

and fairness

– From single M-to-1 to Multiple M-to-1 (senders to multiple receivers)

Page 60: OptIPuter System Software

System Software

UDP Data Transport (UDT)

Robert L Grossman, Yunhong Gu, Xinwei Hong, & David Hanley

National Center for Data MiningUniversity of Illinois at Chicago

OptIPuter All Hands Meeting, January 2005

Page 61: OptIPuter System Software

System Software

Composable Protocol Toolkit (CPT)(UIC-LAC)

• Concept / Goals:– Some Applications will send multiple high volume flows (teraflows) over a single lambda– Application interface to OptIPuter Communication is via XIO interface– Specialized congestion control (CC) algorithms may be needed for these teraflows.– Idea: Accelerate development of new congestion control algorithms with toolkit

– New congestion control implementation <-> different CPT CC functions.– Project co-funded by NSF & DOE

• Accomplishments:– Developed prototype Composable Protocol Toolkit– Interpreted UDT as new type of AIMD protocol called Decreasing Increases AIMD– Conducted initial experimental studies.

• Future:– Continue development and testing of Composible Protocol Toolkit (CPT).– Use CPT to explore congestion control algorithms

Different CPT CC functions.

[Grossman, UIC]

Page 62: OptIPuter System Software

System Software

Storage Research Activities

Huaxia Xia, Justin Burke, and Andrew Chien

University of California, San Diego

January 2005

OptIPuter All Hands Meeting

Page 63: OptIPuter System Software

System Software

RobuSTore: Robust Performance (Gigabytes/Second) from Geographically Distributed Storage

• RobuSTore: Statistical Storage– Systematic Introduction of Redundancy, High Efficiency LDPC Codes across Distributed

Storage– Improve Aggregate Statistical Properties of Access => Guaranteed, High Performance– Predictable Access Latency, Isolatable Performance in Shared Environments

• Goals– Distributed RobuSTore System– Support Flexible Distributed Storage Sharing

Page 64: OptIPuter System Software

System Software

Storage Progress

• High Performance File System Survey– Study existing parallel/distributed file systems

– GPFS, Lustre, PVFS, Galley, DASF, Vesta, Armada, FAB, MPIO,, Zebra, etc.

– No existing system meets needs of OptIPuter environment!

– => Selected Lustre (emerging Open Source Standard) as Prototyping Environment

• Key Question: Can Erasure Codes can be Applied in a High Performance System?– Best previous performance: ~150Mb/s (LuigiRizzo)

– New Memory Hierarchy Tuned, Tiled Implementation Achieves 300+ MByte/s (about 16 times faster) on a 2Ghz Xeon

– Fast enough to keep up with OptIPuter network

• RobuSTore Design: Complete at High Level– Detailed Analytical Modeling and Simulation is underway

– There are MANY (millions) of ways to apply the idea

– Initial Performance Results

Page 65: OptIPuter System Software

System Software

Preliminary RobuSTore Simulation Results

• Read 1GB Data: Simple Striping versus Erasure-Coded Striping– RobuSTore use of Erasure Codes Improvement

– 3-5x Average Performance– 3x Standard Deviation

Disks: Same Type,Different Layout

Simple Striping: 1-16x Storage Overhead

Erasure Code: 3x Storage Overhead

Page 66: OptIPuter System Software

System Software

Year 3 Plans

• Extensive Simulations of RobuSTore Design and Testbed Configurations– Evaluate Alternatives

– Provide Configuration Guidelines for Layout, Striping Algorithms

• Prototype Implementation on Lustre– Experiments on UCSD Testbeds

– Exploit high speed OptIPuter Transport Protocols (GTP, CEP, etc.)

– Efficient Name Space Management and Metadata Service

– Evaluation Using Benchmarks and Neuroscience and Geophysical Application Workloads

Page 67: OptIPuter System Software

Security

Mike Goodrich

University of California, Irvine

January 2005

OptIPuter All Hands Meeting

Page 68: OptIPuter System Software

Broadcast Encryption Group controller (GC)

broadcasts messages A set S of n devices receive

every message A subset R of r devices from

S are revoked The group controller should

encrypt messages so that only non-revoked devices can decrypt them, even if the revoked devices collude

GC

ValidDevices

RevokedDevices

Page 69: OptIPuter System Software

Efficient Secure Broadcast Encryption Tree-based Membership Revocation (the hard part) Invented the first zero-state broadcast encryption scheme

to achieve O(r) messages per broadcast and O(log n) keys per device, with r revoked devices Small number of keys / member Small number of messages (few round trips!)

The constants are small and the schemes are practical

The n devices

[Goodrich, Sun, Tamassia, UCI]

Page 70: OptIPuter System Software

Deterministic Sampling and Range Counting in Geometric Data StreamsBagchi, Chaudhary, Eppstein, Goodrich

A Data Stream is a massive data set which is revealed one item at a time.

Several data stream settings involve spatial data: Sensor data e.g. for air quality measurement. Traffic or herd monitoring e.g. location information for

mobile phones. Scientific data.

The challenge is to perform useful computations on these data streams while maintaining a small memory footprint.

Page 71: OptIPuter System Software

New Results for Data Streams Deterministic epsilon-Approximations for data

streams can be computed in polylogarithmic time and space.

These have many applications, including solving iceberg queries and in robust statistics.

Page 72: OptIPuter System Software

Secure Biometric Authentication for Weak Computational DevicesAtallah, Frikken, Goodrich, Tamassia Computationally ``lightweight''

schemes for performing biometric authentication without revealing information that can later be used to impersonate the user.

The client and server need only perform cryptographic hash computations on the feature vectors, and do not perform any expensive public-key encryption operations.

Appealing even in a framework of powerful devices capable of public-key signatures and encryptions.

Page 73: OptIPuter System Software

Secure Biometric Authentication for Weak Computational Devices, cont.Atallah, Frikken, Goodrich, Tamassia

Our schemes make it computationally infeasible for an attacker to impersonate a user even if the attacker completely compromises the information stored at the server.

Likewise, our schemes make it computationally infeasible for an attacker to impersonate a user even if the attacker completely compromises the information stored at the client device.

Page 74: OptIPuter System Software

Year 3 Plans

UCI: Uncheatable Grid Computing [Touch & Bannister, USC/ISI]

Transec: High Speed Transport Security for OptIPuter

Scalable defenses to protect TCP against SYN attacks, RST/data window attacks,

etc. UDP against port overload Applies FASTsec (IPsec++ for perf.)

Page 75: OptIPuter System Software

Information Sciences Institute

Joe BannisterAaron Falk

Jim PepinJoe Touch

OptIPuter ProjectYear 3 Plans

January 18, 2005

Page 76: OptIPuter System Software

OptIPuter TranSec

Scalable defenses• Protect TCP against SYN attacks, RST/data window attacks,

etc.

• Protect UDP against port overload

Applies FASTsec (IPsec++ for perf.)• Pipelining, parallelism support

• Partial protection variants

Merges per-packet w/per-data security• Decouple header security from data security

Page 77: OptIPuter System Software

FASTSec for OptIPuter

Pipelining support• Reduces per-packet latency

• Multiple IPsec headers with chunked data

Parallelism support• Multiple IPsec headers using different keys on a single

stream, to enable parallel hardware

Partial / delayed protection• Protect header with IPsec on-line

• Protect data with CRC elsewhere if needed

Page 78: OptIPuter System Software

Goals

Coordinated but diverse protection:• SYN protection during connection establishment

• RST / data window protection after

• Port protection throughout

Scales with performance• Enables parallel, offloaded pre-validation

Protect header differently than data• Different strength

• Different time (per packet vs. per data chunk)

>> lower latency, higher-throughput transport security

Page 79: OptIPuter System Software

System Software

Summary

• Lots of progress!

• Integrated demonstrations: 3-layer to full 5-layer with applications!• Increasing in size, scale, and performance!

• Broad Range of Activities driving Core Technologies forward– DVC

– Real-Time (TMO)

– Performance Analysis (Prophesy)

– High Speed Protocols (CEP, LambdaStream, XCP, GTP, UDT)

– Storage (RobuSTore)

– Security

• Come and Join the fun!

• Questions?