aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How...

154
aks249

Transcript of aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How...

Page 1: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

aks249

Page 2: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Parallel Metropolis-Hastings-Walker

Sampling for LDA Xanda Schofield

Topic: probability distribution across words (P(“how”) = 0.05, P(“cow”) = 0.001).

Document: a list of tokens (“how now brown cow”).

Topic model: a way of describing how a set of topics could generate documents (e.g. Latent Dirichlet Allocation [Blei et. al., 2003]).

Inferring topic models is HARD, SLOW, and DIFFICULT TO PARALLELIZE.

Goal: to parallelize an optimized inference algorithm (MHW for LDA) efficiently for

one consumer-grade computer.

Page 3: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Parallel Metropolis-Hastings-Walker

Sampling for LDA Xanda Schofield

Gibbs Sampling [Griffiths et. al. 2004]:

O(number of iterations * number of tokens * number of topics)

Metropolis-Hastings-Walker Sampling [Li et. al. 2014]:

O(number of iterations * number of tokens * number of topics in a token’s document)

Needed for computations:

Nkd: tokens in document d assigned to topic k

Nwk: tokens of word w assigned to topic k

qw: cached sampled topics for word w

A few user-set parameters

Page 4: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Parallel Metropolis-Hastings-Walker

Sampling for LDA Xanda Schofield

How we do it:

Split documents across processors (Nkd)

Keep updated Nwk

Share Nwk

Synchronize Nwk each iteration

Gossip Nwk to a random processor each iteration

Keep valid qw

Share

Make per-processor

Measuring comparative performance and held-out likelihood with # processors

Page 5: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

ers273

Page 6: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

An Analysis of the CPU/GPU Unified Memory

ModelEston Schweickart

Page 7: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Unified Memory

Page 8: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

3 Contexts• Multi-stream Cross-device Mapping

• Basic, intended use case for UM

• Big integer addition

• Linked lists: hard to transfer

• Nonlinear Exponential Time Integration (Michels et al 2014)

• Both GPU and CPU bound computation, nontrivial implementation

Page 9: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Analysis

• Ease of implementation

• Lines of code

• Required concepts

• Performance

• Memory Transfer Optimizations?

Page 10: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Results• UM is best as an introductory concept

• Removes burden of explicit memory transfer

• UM is hard to optimize

• No control over data location

• Recommend: compiler hints, better profiling tools

Page 11: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

fno2

Page 12: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 13: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Provide insight +

maximize privacy

Page 14: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Lifestreams

(format)

Bolt

(chunk) CryptDB (encrypt)

Page 15: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

� Built Lifestreams DPU

� Encrypted database & queries

� Demoed visualizations

� Developed 3rd party API

Page 16: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

fz84

Page 17: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Timing Channel Mitigation in Schedulera Case Study of GHC

Fan Zhang

Dept. of Computer ScienceCornell University

December 4, 2014

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 1 / 9

Page 18: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Timing Channelin Scheduler

A timing channel is a secret channel for passing unauthorizedinformation, which is encoded in certain timing information

E.g.: Cache timing: response time of a memory access can revealinformation about whether the page is in cache or notby observing running time of AES encryption thread, one can guessAES key.

In OS, scheduler is an main source of secret information leakage

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 2 / 9

Page 19: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Timing Channelin Scheduler

Consider round-robin scheduling with epoch T

Ideally, context switch happens at nT .

However, for many reasons, context switch happens at nT + δ (whereδ > 0 is a random variable) as at nT ,

thread is performing atomic operation (uninterruptable)interrupt is disabled (so timer interrupt is ignored)

δ is exploitable to pass secret information (even don’t know how)

H = −∑

pi log(pi )

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 3 / 9

Page 20: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem

How to measurement δ in a real world scheduler (GHC)?

Glasgow Haskell Compiler, is a state-of-the-art, open source compilerand interactive environment for the functional language Haskell.

How to mitigate this timing channel, i.e. eliminate δ

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 4 / 9

Page 21: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem I

How to measurement δ in a real world scheduler (GHC)?

Probe GHC → break GHC scheduler down → read GHC code..

Workload dependent? Three samples:

calc: an encryption thread which is computation intensive

get: a networking thread retrieving files via HTTP

file: a thread reading and writing files on disk

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 5 / 9

Page 22: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Results

0 2 4 6 8 10 12 14 16 18 200.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time (ms)

CD

F

CDF of δ

file

get

calc

(a) CDF of δ

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time (ms)

CD

F

Power law fit for δ CDF

original data

power law fit

(b) Power law fit (p = ax−b + c)

calc file get

Delay(δ) 6.05 14.52 49.82

Table: Timing channel capacity H = −∑

pi log(pi ) (bit/s)

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 6 / 9

Page 23: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem

How to mitigate this timing channel, i.e. eliminate δ

Algorithm 1: Incremental round-robin schedule

Data: initial time slice T0, and incremental value b, timer ttc = t.expiration

if current thread can be switched ∨ tc ≥ T0/2 thenset context switch = 1reset t.expiration = T0

elsereset t.expiration = tc + b

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 7 / 9

Page 24: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Results

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

b (ms)

ca

pa

city d

ecre

ase

(%

)

file

get

calc

Figure: y = xFan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 8 / 9

Page 25: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Conclusion

Timing channel exists in scheduler, δ is an example

δ can be approximated by power law distribution!

I/O-bound threads tend to leak more information in its schedulefootprint.

Though the simulation shows that incremental round robin schedulingcan effectively erase information entropy, timing channel mitigation isa difficult problem.

Fan Zhang ( Dept. of Computer Science Cornell University )Timing Channel Mitigation in Scheduler December 4, 2014 9 / 9

Page 26: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

gjm97

Page 27: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

PROOF OF STAKE IN THE ETHEREUM

DECENTRALIZED STATE MACHINE

Galen Marchetti

Page 28: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Decentralized State Machine

¨  Ethereum ¤  second-generation cryptocurrency – coins called “ether” ¤ Bitcoin Blockchain + Ethereum Virtual Machine

¨  Stakeholders in network purchase state transitions ¤ Mining fee includes cost per opcode in EVM ¤ Miners provide Proof-of-Work to register transitions in

blockchain

Page 29: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Establishing Consensus

¨  Proof-Of-Work ¤ Consensus group is all CPU power in existence ¤ Miners solve crypto-puzzles ¤ Employed by Bitcoin, Namecoin, Ethereum ¤ Subject to 51% outsider attack

¨  Proof-Of-Stake ¤ Consensus group is all crypto-coins in the network ¤ Miners provide evidence of coin possession

Page 30: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Mining Procedure

¨  Select parent block and “uncles” in blockchain ¨  Generate nonce and broadcast block header ¨  Nodes receiving empty header deterministically

select N pseudo-random stakeholders ¨  Each stakeholder signs blockheader and broadcasts

to network ¨  Last stakeholder adds state transistions, signs total

block with its own signature, broadcasts to network. ¨  Mining profit evenly distributed among stakeholders

and original node

Page 31: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Evaluation

¨  Mining now requires several broadcast steps ¨  Use Amazon’s EC2 with geographically separate

nodes

¨  Measure time for pure Ethereum cluster to propagate state transitions in blockchain

¨  Measure time for Proof of Stake Ethereum cluster to propagate state transitions

Page 32: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

ica23

Page 33: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Multicast Channelization for RDMA

Isaac Ackerman

Page 34: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Channelization

● Routing multicast traffic is difficult● Share resource for highest performance

● Verbs interface● Pre-allocate buffers● Sender needs buffer descriptor

RDMA

Page 35: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Model

● Cost for each send/receive● Cost for client to hold buffers open● Cost to coordinate senders

Page 36: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Existing Solutions

● Clustering● MCMD

Doesn't consider memory consumption

0 10 20 30 40 50 600

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Naive MCMD NYSE simple IBM simple

Channels

Rel

ativ

e C

ost t

o U

nlim

ited

Cha

nnel

s

Page 37: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Fixes

● Consider different MTU– Pseudopolynomial

– Still slow

● Using channels incurs memory cost– Cautiously introduce new channels

Page 38: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Results

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Channelization Extensions

NYSE SimpleNYSE MTUNYSE PassiveIBM SimpleIBM MTUIBM Passive

Channels

Rel

ativ

e C

ost

Page 39: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Adaptivity

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.5

1

1.5

2

2.5

3

Adaptability of Channelization Solutions

Passive 10 ChannelPassive 20 ChannelSimple 10 ChannelsSimple 20 Channels

Solution Staleness (ms)

Cos

t

Page 40: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Future Work

● Making use of unused channels● Incorporating UDP for low rate flows● Reliability, Congestion Control

Page 41: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

km676

Page 42: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Kai Mast

Page 43: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

The Brave New World of NoSQL

● Key-Value Store – Is this Big Data?● Document Store – The solution?● Eventual Consistency – Who wants this?

Page 44: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

HyperDex

● Hyperspace Hashing● Chain-Replication● Fast & Reliable● Imperative API● But...Strict Datatypes

Page 45: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 46: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

ktc34

Page 47: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Transaction Rollback in Bitcoin

• Forking makes rollback unavoidable, but can we minimize the loss of valid transactions?

Source: Bitcoin Developer Guide

Page 48: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Motivation

• Extended Forks

– August 2010 Overflow bug (>50 blocks)

– March 2013 Fork (>20 blocks)

• Partitioned Networks

• Record double spends

Page 49: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Merge Protocol

• Create a new block combining the hash of both previous headers

• Add a second Merkle tree containing invalidated transactions

– Any input used twice

– Any output of an invalid transaction used as input

Page 50: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Practicality (or: Why this is a terrible idea)

• Rewards miners who deliberately fork the blockchain

• Cascading invalidations

• Useful for preserving transactions when the community deliberately forks the chain

– Usually means something else bad has happened

• Useful for detecting double spends

Page 51: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

ml2255

Page 52: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Vu  Pham  

Topology Prediction for Distributed Systems

Moontae Lee

Department of Computer Science Cornell University

December 4th, 2014

0/8 CS6410 Final Presentation

Page 53: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  People use various cloud services •  Amazon / VMWare / Rackspace •  Essential for big-data mining and learning

Introduction

CS6410 Final Presentation 1/8

Page 54: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  People use various cloud services •  Amazon / VMWare / Rackspace •  Essential for big-data mining and learning

without knowing how computer nodes are interconnected!

Introduction

CS6410 Final Presentation 1/8

??

Page 55: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  What if we can predict underlying topology?

Motivation

CS6410 Final Presentation 2/8

Page 56: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  What if we can predict underlying topology?

•  For computer system (e.g., rack-awareness for Map Reduce)

Motivation

CS6410 Final Presentation 2/8

Page 57: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  What if we can predict underlying topology?

•  For computer system (e.g., rack-awareness for Map Reduce) •  For machine learning (e.g., dual-decomposition)

Motivation

CS6410 Final Presentation 2/8

Page 58: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  Let’s combine ML technique with computer system!

latency info à à topology

•  Assumptions •  Topology structure is tree (even simpler than DAG) •  Ping can provide useful pairwise latencies between nodes

•  Hypothesis •  Approximately knowing the topology is beneficial!

How?

CS6410 Final Presentation 3/8

Black  Box  

Page 59: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  Unsupervised hierarchical agglomerative clustering

à

Merge the closest two nodes every time!

Method

CS6410 Final Presentation 4/8 Outline

a   b   c   d   e   f  

a   0   7   8   9   8   11  

b   7   0   1   4   4   6  

c   8   1   0   4   4   5  

d   9   5   5   0   1   3  

e   8   5   5   1   0   3  

f   11   6   5   3   3   0  

Page 60: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Sample Results (1/2)

CS6410 Final Presentation 5/8

Page 61: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Sample Results (2/2)

CS6410 Final Presentation 6/8

Page 62: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  How to evaluate distance? (Euclidean vs other) •  What is the linkage type? (single vs complete)

•  How to determine cutoff points? (most crucial) •  How to measure the closeness of two trees?

•  Average hops two the lowest common ancestor

•  What other baselines? •  K-means clustering / DP-means clustering •  Greedy partitioning

Design Decisions

CS6410 Final Presentation 7/8

Page 63: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

•  Intrinsically (within simulator setting) •  Compute the similarity with the ground-truth trees

•  Extrinsically (within real applications) •  Short-lived (e.g., Map Reduce)

•  Underlying topology does not change drastically while running •  Better performance by configuring with the initial prediction

•  Long-lived: (e.g., Streaming from sensors to monitor the powergrid) •  Topology could change drastically when failures occur •  Repeat prediction and configuration periodically •  Stable performance even if the topology changes frequently

Evaluation

CS6410 Final Presentation 8/8

Page 64: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

nja39

Page 65: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Noah Apthorpe

Page 66: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Commodity Ethernet ◦ Spanning tree topologies

◦ No link redundancy

A

B

Page 67: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Commodity Ethernet ◦ Spanning tree topologies

◦ No link redundancy

A

B

Page 68: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

IronStack spreads packet flows over disjoint paths ◦ Improved bandwidth

◦ Stronger security

◦ Increased robustness

◦ Combinations of the three

A

B

Page 69: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

IronStack controllers must learn and monitor network topology to determine disjoint paths

One controller per OpenFlow switch

No centralized authority

Must adapt to switch joins and failures

Learned topology must reflect actual physical links ◦ No hidden non-IronStack bridges

Page 70: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Protocol reminiscent of IP link-state routing

Each controller broadcasts adjacent links and port statuses to all other controllers ◦ Provides enough information to reconstruct network topology ◦ Edmonds-Karp maxflow algorithm for disjoint path detection

A “heartbeat” of broadcasts allows failure detection

Uses OpenFlow controller packet handling to differentiate bridged links from individual wires

Additional details to ensure logical update ordering and graph convergence

Page 71: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Traffic at equilibrium

Page 72: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Traffic and time to topology graph convergence

Page 73: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Node failure and partition response rates

Page 74: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Questions?

Page 75: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

pj97

Page 76: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Soroush Alamdari

Pooya Jalaly

Page 77: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 78: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

Page 79: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

Page 80: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

Page 81: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

• Two choice method

• Choose two random machines

• Assign the job to the machine with smaller load.

Page 82: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

• Two choice method

• Choose two random machines

• Assign the job to the machine with smaller load.

• Two choice method works exponentially better than random

assignment.

Page 83: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 84: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

Page 85: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

Page 86: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

Page 87: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

• Irregular patterns of incoming jobs

• Soft partitioning

Page 88: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

• Irregular patterns of incoming jobs

• Soft partitioning

• Modified two choice model

• Probe a machine from within, one from outside

Page 89: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 90: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

Page 91: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

No

Burst Burst

p

p

1-p

1-p

Page 92: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 93: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 94: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 95: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 96: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

pk467

Page 97: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Software-Defined Routing for Inter-Datacenter Wide Area Networks

Praveen Kumar

Page 98: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problems

1. Inter-DC WANs are critical and highly expensive2. Poor efficiency - average utilization over time of busy links is only 30-50%3. Poor sharing - little support for flexible resource sharing

MPLS Example: Flow arrival order: A, B, C; each link can carry at most one flow

* Make smarter routing decisions - considering the link capacities and flow demandsSource: Achieving High Utilization with Software-Driven WAN, SIGCOMM 2013

Page 99: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Merlin: Software-Defined Routing

● Merlin Controller○ MCF solver○ RRT generation

● Merlin Virtual Switch (MVS) - A modular software switch○ Merlin

■ Path: ordered list of pathlets (VLANs)■ Randomized source routing■ Push stack of VLANs

○ Flow tracking○ Network function modules - pluggable○ Compose complex network functions from primitives

Page 100: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Some resultsNo VLAN Stack Open vSwitch CPqD ofsoftswitch13 MVS

941 Mbps 0 (N/A) 98 Mbps 925 Mbps

SWAN topology *

● Source: Achieving High Utilization with Software-Driven WAN, SIGCOMM 2013

Page 101: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

rmo26

Page 102: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

AMNESIA-FREEDOM AND EPHEMERAL DATA CS6410 December 4, 2014

1

Page 103: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Ephemeral Data

¨  “Overcoming CAP” describes using soft-state replication to keep application state in the first-tier of the cloud.

¨  Beyond potential performance advantages, this architecture may be the basis for “ephemerality” wherein data is intended to disappear.

¨  “Subpoena-freedom” ¨  No need to wipe disks, just restart your instances.

2

Page 104: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Cost and Architecture

¨  “Overcoming CAP” does not address questions of cost.

¨  Using reliable storage to preserve state has significant cost consequences.

¨  First goal of this project is to produce a model of the cost with cloud architecture choices.

¨  Key cost drivers: compute hours, data movement, storage.

3

Page 105: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Performance Numbers

¨  “Overcoming CAP” claims but does not demonstrate superior performance with the amnesia-free approach.

¨  Second goal of this project is to compare performance in live systems.

¨  A cost-determined amnesia-free architecture compared against architectures that rely on reliable storage.

4

Page 106: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

sh954

Page 107: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Soroush Alamdari

Pooya Jalaly

Page 108: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 109: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

Page 110: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

Page 111: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

Page 112: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

• Two choice method

• Choose two random machines

• Assign the job to the machine with smaller load.

Page 113: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Distributed schedulers

• E.g. 10,000 16-core machines, 100ms average processing times

• A million decisions per second

• No time to waste

• Assign the next job to a random machine.

• Two choice method

• Choose two random machines

• Assign the job to the machine with smaller load.

• Two choice method works exponentially better than random

assignment.

Page 114: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 115: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

Page 116: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

Page 117: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

Page 118: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

• Irregular patterns of incoming jobs

• Soft partitioning

Page 119: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Partitioning the machines among the schedulers

• Reduces expected maximum latency

• Assuming known rates of incoming tasks

• Allows for locality respecting assignment

• Smaller communication time, faster decision making.

• Irregular patterns of incoming jobs

• Soft partitioning

• Modified two choice model

• Probe a machine from within, one from outside

Page 120: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk
Page 121: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

Page 122: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

No

Burst Burst

p

p

1-p

1-p

Page 123: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 124: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 125: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 126: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

• Simulated timeline

• Burst of tasks

• Metric response times

No

Burst Burst

p

p

1-p

1-p

𝑆1

𝑆2

𝑀1

𝑀2

Page 127: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

vdk23

Page 128: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

FPGA Packet ProcessorFor IronStack

Vasily Kuksenkov

Page 129: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem

● Power grid operators use an intricate feedback system for stability

● Run using microwave relays and power cable signal multiplexing

● Data network issues

○ Vulnerable to attacks

○ Vulnerable to disruptions

○ Low capacity links

● Solution: switch to simple Ethernet

Page 130: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem● Ethernet employs a loop-free topology

○ Hard to use link redundancies

○ Failure recovery takes too long

● Solution: IronStack SDN

○ Uses redundant network paths to improve

■ Bandwidth/Latency

■ Failure Recovery

■ Security

Page 131: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Problem● Packet Processing

○ Cannot be done on the switch

○ Cannot be done at line rate (1-10Gbps) on the controller

● Solution: NetFPGA as a middle-man

○ Controller sets up routing rules and signals NetFPGA

○ Programmed once, continues to work

○ Scalable, efficient, cost-effective

Page 132: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Implementation/Analysis● Improvements in

○ Bandwidth (RAIL 0)

○ Latency (RAIL 1)

○ Tradeoffs (RAIL 6)

● Future

○ Security

○ Automatic tuning

Page 133: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

Questions?

Page 134: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

vs442

Page 135: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Studying the effect of traffic pacing on TCP throughput

Vishal Shrivastav

Dec 4, 2014

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 1 / 10

Page 136: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wirePacing: making the inter packet gaps uniform

TCP traffic tends to be inherently bursty

Other potential benefits of pacing

Better short-term fairness among flows of similar RTTs

May allow much larger initial congestion window to be used safely

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10

Page 137: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wirePacing: making the inter packet gaps uniform

TCP traffic tends to be inherently bursty

Other potential benefits of pacing

Better short-term fairness among flows of similar RTTs

May allow much larger initial congestion window to be used safely

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10

Page 138: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wirePacing: making the inter packet gaps uniform

TCP traffic tends to be inherently bursty

Other potential benefits of pacing

Better short-term fairness among flows of similar RTTs

May allow much larger initial congestion window to be used safely

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10

Page 139: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wirePacing: making the inter packet gaps uniform

TCP traffic tends to be inherently bursty

Other potential benefits of pacing

Better short-term fairness among flows of similar RTTs

May allow much larger initial congestion window to be used safely

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10

Page 140: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Previous works

Focused on implementing pacing at the transport layer

Some major limitations of that approach

Less precision - Not very fine granular control of the flow

NIC features like TCP segment offload lead to batching andshort-term packet bursts

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 3 / 10

Page 141: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Problem: commodity NICs do not provide software access to PHY layer

Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10

Page 142: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Problem: commodity NICs do not provide software access to PHY layer

Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10

Page 143: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Problem: commodity NICs do not provide software access to PHY layer

Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10

Page 144: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Problem: commodity NICs do not provide software access to PHY layer

Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10

Page 145: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Problem: commodity NICs do not provide software access to PHY layer

Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10

Page 146: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time

Very small packet processing time - simple algorithm, extremely fastimplementation

Where to place pacing middleboxes in the network

Given a maximum of k pacing middleboxes, where should we placethem in the network to achieve optimal throughput?

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10

Page 147: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time

Very small packet processing time - simple algorithm, extremely fastimplementation

Where to place pacing middleboxes in the network

Given a maximum of k pacing middleboxes, where should we placethem in the network to achieve optimal throughput?

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10

Page 148: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time

Very small packet processing time - simple algorithm, extremely fastimplementation

Where to place pacing middleboxes in the network

Given a maximum of k pacing middleboxes, where should we placethem in the network to achieve optimal throughput?

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10

Page 149: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time

Very small packet processing time - simple algorithm, extremely fastimplementation

Where to place pacing middleboxes in the network

Given a maximum of k pacing middleboxes, where should we placethem in the network to achieve optimal throughput?

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10

Page 150: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Network topology for experiments

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 6 / 10

Page 151: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Testing the behavior of pacing algorithm

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 7 / 10

Page 152: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Experimental results

n : a value within [0,1], parameter for number of pkt bursts in a flow.

p : a value within [0,1], parameter for the geometric dist. used togenerate the number of packets within a pkt burst.

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 8 / 10

Page 153: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk

CS 6410 Project Presentation Dec 4, 2014

Experimental results

n : a value within [0,1], parameter for number of pkt bursts in a flow.

p : a value within [0,1], parameter for the geometric dist. used togenerate the number of packets within a pkt burst.

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 9 / 10

Page 154: aks249 - cs.cornell.edu · Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield How we do it: Split documents across processors (N kd) Keep updated N wk Share N wk