New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil...

24
New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign

Transcript of New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil...

Page 1: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

New Algorithms for Planning Bulk Transfervia Internet and Shipping Networks

Brian Cho Indranil GuptaUniversity of Illinois at

Urbana-Champaign

Page 2: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

2

Motivation: Ad-hoc Data Processing• Data-intensive research on OpenCirrus– Federated cloud: diverse geographic locations– Data scale of TBs

• Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07]

• Success story: Washington Post– Hillary Clinton White House schedule

• Released as 17,481 pages non-searchable PDF images• Convert to searchable text and deliver to newsroom within

the same news cycle

– Done within 26 hours with Amazon AWS• Pay for bandwidth and computer usage

Page 3: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

3

• Pandora (People and networks moving data around)– First ever solution to transfer data cooperatively between

multiple sources with internet and shipping edges– Produce optimal transfer plans that obey time deadlines

and minimize dollar cost Better than internet-only and shipping-only strategies

Bulk Transfer Options• Internet Transfer

– Grid: [GridFTP]– PlanetLab: [CoBlitz 06]

• Disk Shipping Transfer– [Jim Gray 03]– [PostManet 04]– [DOT 06]– Amazon AWS

Import/Export

Page 4: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

4

5-20 Mbps 1TB: 5-20 days

Data Source (Illinois)

Option 1: Internet Transfer

ComputationProvider

(Amazon)

Data Source(CMU)

$0.10 per GB

No Cost

Page 5: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

5

Disk Interface 40 MB/s

Overnight: $60 per DiskTwo-Day: $30 per DiskGround: $10 per Disk

Data Source(Illinois)

Option 2: Disk Shipping Transfer

ComputationProvider

(Amazon)

Data Source(CMU)

Overnight: $50 per DiskTwo-Day: $25 per Disk

Ground: $5 per Disk

$0.02 per GB$80 per Disk

Overnight: $40 per DiskTwo-Day: $15 per Disk

Ground: $5 per Disk

Page 6: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

6

Cooperative Transfer Solutions

• Good solutions– Meet deadlines– Minimize dollar cost

• Complexity– Global scale– Many strategies– Collaboration helps

• How to find the best solution?

Open Cirrus Sites

Page 7: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

7

15 Days

DataSource A

No Cost

DataSource B

Example: Minimize Dollar Cost

CloudService

Provider

0.8 TB

1.2 TB

Loading: $40Handling: $80

Total Cost: $125Total Time: 20 Days

5 Days .

Ground: $5 14 hours

Page 8: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

8

DataSource A

1 Day

Overnight: $40

DataSource B

Example: Meet Deadline (3 days)while Minimizing Dollar Cost

CloudService

Provider

0.8 TB

1.2 TB

Loading: $40Handling: $80

Total Cost: $210Total Time: 3 Days

1 Day .

Overnight: $50 . 14 hours

6 hours

Page 9: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

9

Outline

• Motivation• Problem Formulation– Graph Model– Flow Over Time

• Solution: Pandora• Experimental Results• Conclusion

Page 10: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

Graph Model: Internet Links

10

inet_out

inet_in

inet_out

inet_in

Incoming/Outgoing BW

Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)

Site A Site B

Page 11: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

Graph Model: Shipment Links

11

inet_out

inet_in

ship_in

inet_out

inet_in

ship_in

Incoming/Outgoing BW

Disk Interface BW e.g., 40 MB/sCost: Loading ($/GB)

Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)

Capacity (almost infinite)Cost: Shipping and Handling ($/Disk)Transit time (Hrs)

Site A Site B

Page 12: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

12

Data Transfer Over Time

• Goal: Meet time deadline T while minimizing dollar cost C

• Hard problem on graph with both Internet and Shipment links– NP-Hard– Formal problem and proof in paper

• Solution: Pandora computes optimal and approximate solutions

Page 13: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

13

Solution: Pandora Overview

• Transform into static time-expanded network– Decomposition of shipping edges

• Solve min-cost flow on static network– Mixed Integer Program– Optimizations to reduce computation time

Page 14: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

14

Time-expanded Network• Intuitively, incorporate time

into graph to create an extended graph representation

• Make T=deadlinecopies of each vertex

• Draw edges according to transit time

• Draw holdover edges

• [Ford Fulkerson 58]• Disk shipment represented as

time-expanded network

τ = 1τ = 3

T = 5

time

Page 15: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

15

Decomposed Shipping Edges• Decompose shipping

edges to fixed cost edges1. Transit time2. Fixed cost3. Capacity

cost = $130

capacity = 2 TB

cost = $110

capacity = 2 TB

cost = $100 cap = 2 TB

Page 16: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

16

• Fixed-cost edges make min-cost flow calculation NP-Hard• Mixed-Integer Program (MIP)

– Binary variable ye defined on fixed-cost edges

• Goal: Minimize dollar cost• Subject to– Capacity constraints (flowe ≤ capacitye ∙ ye)– Conservation of flow– Demands of sources and sink

• Proof of NP-Hardness and formal MIP in paper

Solution: Min-cost Flow Calculation using Mixed-Integer Program

Page 17: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

17

Optimizations: Overview

• Size of MIP grows linearly with deadline T– Worst-case running time grows exponentially with T

• Reduce size of the MIP– Reduce number of shipment edges– Δ -condensed time-expanded networks

• More optimizations in paper

Page 18: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

18

Optimizations: Reduce numberof shipment edges

• Can remove redundant shipment edges

• Example:– Overnight shipment sent

anytime before 4pm will arrive at destination at 8am

8am

4pm

3pm

2pm

1pm

noon

7am

Page 19: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

19

Optimization: Δ-condensedTime-expanded Network

• Each batch of consecutive Δ time units condensed into one virtual time unit

• Solution has– Minimum cost– Deadline approximation

depending on Δ

• More details in paper• [Fleischer Skutella 07] Δ = 2

Page 20: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

20

Experimental Setup

• Trace-driven– Wrote scripts to communicate with FedEx web

services: queried package rates and destination time

– Internet BW from PlanetLab measurements• GNU Linear Programming Kit (GLPK)

Page 21: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

21

Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW

• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take

advantage of heterogeneous bandwidth

• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks

to capacity

2 3 4 5 61

78

t 0.25 TBx 8Width proportional to BW

Page 22: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

22

Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW

12 3

45

8 t7

6

1.92 TB0.14 TB

0.06 TB 0.08 TB

• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take

advantage of heterogeneous bandwidth

• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks

to capacity

• Pandora Deadline=96hrs– Cost: $183– Time: < 96 hrs

Page 23: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

23

Experimental Results: Optimizations• Reducing shipment edges

decreases computation time• Using Δ-condensed time-expanded

networks decreases computation time– Deadlines met in our experiments

2 sources 1 source

Page 24: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign.

24

Conclusion

• First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges

• Produce optimal transfer plans that obey time deadlines and minimize dollar costBetter than internet-only and shipping-only

strategies• Reasonable computation time by using

optimizations