Seaweed: Scalable Delay Aware Querying
-
Upload
brandon-gray -
Category
Documents
-
view
27 -
download
1
description
Transcript of Seaweed: Scalable Delay Aware Querying
![Page 1: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/1.jpg)
Seaweed: Scalable Delay Aware Querying
Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron
Microsoft Research, Cambridge
![Page 2: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/2.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 2
Motivation•Large, highly distributed data
sets•Data stored on endsystems•Endsystems often unavailable•Centralization, replication do not
scale•Must query data in-situ•How can we deal with
unavailability?
![Page 3: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/3.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 3
Delay aware querying• In-situ
•Push queries to endsystems
• Incremental results•As endsystems become available
•Progress estimation•Current and future completeness
•Scalability•Fault-tolerance
![Page 4: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/4.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 4
Applications•Admin, diagnostics, resource
mgmt•Select-Project-Aggregate queries•Small results•Low to moderate query rates
•Different network scales•Data center (10,000+)•Enterprise (100,000+)• Internet (1,000,000+)
![Page 5: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/5.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 5
Enterprise network management
•Endsystem-based monitoring•Endsystems log their own traffic•Flow and PacketHeader tables
•Queries by admins/operators• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80
•Flow is horizontally partitioned
•300,000 hosts, 1 month•765 TB total size•2.4 Gbps update rate
![Page 6: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/6.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 6
Roadmap•Motivation•Design
•Overview•Delay awareness•Distributed query protocols
•Evaluation•Conclusion
![Page 7: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/7.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 7
Seaweed overview• In-situ querying
• One-shot queries
• Incremental results• Progress estimation
• Meta-data replication
• Exactly-once semantics• Scalable, failure-resilient
protocols• Built on P2P overlay
![Page 8: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/8.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 8
Why delay awareness?•Endsystem unavailability
![Page 9: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/9.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 9
What is delay awareness?•User receives partial results•Needs progress indicator
•How much data is out there?•How much have I seen?•How long before I get to 99%?
•Delay/completeness tradeoff•Predicted by Seaweed
![Page 10: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/10.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 10
Completeness•% of relevant data rows seen so
far•Relevant matches query
predicates•Query-specific
•Completeness predictor:•Currently available rows•Total rows•Expected rows/time
![Page 11: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/11.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 11
Completeness predictor
![Page 12: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/12.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 12
Completeness prediction•Relevant rows
•Column histograms•Standard row-count estimation•Replication remote estimation
•Uptime•Availability models
•Replicated meta-data•Highly available•Orders of magnitude smaller than
data
![Page 13: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/13.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 13
Predictor generation• Meta-data replicated periodically• Query sent to all endsystems
•Application-level multicast tree•Retransmit on failure•Aggregate predictors in-tree
• Exactly-once semantics•Available local histogram, time=0•Unavailable replica histogram,
avail.
![Page 14: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/14.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 14
0
2
4
6
8
10
12
14
16
18
20
1 10 100 1000 10000Time (hours)
Ro
ws
(m
illi
on
s)
76
77
78
79
80
81
82
1 10 100 1000 10000Time (hours)
Ro
ws
(m
illi
on
s)
0
2
4
6
8
10
12
14
1 10 100 1000 10000Time (hours)
Ro
ws
(m
illi
on
s)
0
1
2
3
4
5
6
7
1 10 100 1000 10000Time (hours)
Ro
ws
(m
illi
on
s)
76
77
78
79
80
81
82
1 10 100 1000 10000Time (hours)
Ro
ws
(mill
ion
s)
Predictor generation
`` `
A B C D
0
10 20 40 5030
10
20
Thickness
Frequency
σ1B:
` `
`
A+B
A+B C+D
C D
80
85
90
95
100
1 10 100 1000 10000Time (hours)
Ro
ws
(m
illi
on
s)
A+B+C+D
A`
0
10 20 40 5030
10
20
Thickness
Frequency
σ1
B C D
![Page 15: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/15.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 15
Query execution•Persistent query state
•New endsystems get active query list
• Incremental convergecast of results•Deterministic child parent mapping•Each vertex is replicated set•Parent remembers child result versions
•Exactly-once semantics• In-network aggregation
![Page 16: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/16.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 16
Roadmap•Motivation•Design•Evaluation•Conclusion
![Page 17: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/17.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 17
Evaluation• Packet-level simulation• Farsite availability traces
•51663 hosts, ~4 weeks•Flow tables from packet traces
•456 hosts, ~4 weeks•Assigned randomly to simulation
hosts
• Two queries• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80• SELECT COUNT(*) FROM Flow WHERE Bytes > 20000
![Page 18: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/18.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 18
Predictor accuracy
![Page 19: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/19.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 19
Prediction accuracy (2)
![Page 20: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/20.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 20
Overheads
0.0001
0.001
0.01
0.1
1
10
100
1000
0 200 400 600 800 1000
Time (hours)
Tx b
andw
idth
(b
ytes
/s/e
ndsy
stem
)
Seaweed maintenance O(1)MSPastry O(log N)Seaweed query O(log N)
![Page 21: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/21.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 21
Scalability
![Page 22: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/22.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 22
Roadmap•Motivation•Design•Evaluation•Conclusion
![Page 23: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/23.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 23
Related work•P2P querying
•PIER, Mercury, …•Move data across network
•Continuous/streaming queries•Astrolabe, SDIMS, Borealis, …• Ignore availability
![Page 24: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/24.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 24
Future work•Selective centralization
•“Distributed materialized views”•Need bandwidth/availability
estimation•Large views can melt network
•Beyond histograms•Wavelets approximate results?
•Real-life experience, measurements•Deployment within Microsoft
![Page 25: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/25.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 25
Conclusion•Querying highly distributed data
•Challenges are unavailability, scale
•Delay awareness•Predict delay/availability tradeoff•Exactly-once semantics
•Seaweed:scalable delay aware querying
•Meta-data replication•Fault-tolerant protocols
![Page 26: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/26.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 26
Questions?
![Page 27: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/27.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 27
Consistency (membership)• “Exactly-once” semantics
•No double-counting•Every endsystem’s results counted
•If available at any point in query lifetime
•“Precise single-site validity”
• Estimate always generated•For all endsystems, available or not•Endsystem computes own estimate
•If available through estimation phase
![Page 28: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/28.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 28
Consistency (time)
•Avoid tight synchronization•Clock-skewed snapshots
•Loosely synchronized clocks•With good NTP, milliseconds
•Currently left to application layer•Timestamped, append-only tuples
•Explicit predicates on timestamp
![Page 29: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/29.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 29
Result aggregation
• Deterministic mapping to parent
• Each parent is replicated set
• Parents remember child results
R1+R2+R3
R3’
`
` `
` `
` ` `
R1 R2
R1,R2 R1,R2
R1+R2 R3
R1+R2,R3 R1+R2,R3R1+R2,R3’ R1+R2,R3’
R1+R2+R3’
![Page 30: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/30.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 30
Query dissemination in Pastry
836
000FFF hash(query)
0FAE??DA0
3??
37B
???
8??
E9A
![Page 31: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/31.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 31
Replication in Pastry
8F690E
910
8E2
000FFF
Topology-independentnode identifiers
Each node maintainsa virtual neighbor set (vset)
8F0
![Page 32: Seaweed: Scalable Delay Aware Querying](https://reader035.fdocuments.us/reader035/viewer/2022062408/5681303f550346895d95dfe0/html5/thumbnails/32.jpg)
Sep 14 2006 Seaweed: Scalable Delay Aware Querying 32
Result routing in Pastry
836
0FA = hash(query)
0360F6