Minimizing Latency in Fault-Tolerant Distributed Stream Processing...
Transcript of Minimizing Latency in Fault-Tolerant Distributed Stream Processing...
![Page 1: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/1.jpg)
Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
Andrey Brito1, Christof Fetzer1, Pascal Felber2
1 Technische Universität Dresden, Germany2 Université de Neuchâtel, Switzerland
Department of Computer Science Institute for Systems Architecture, Systems Engineering Group
ICDCS'09, June 23rd, 2009
![Page 2: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/2.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 2 of 55
Goal
Minimize the cost of logging/checkpointingin event stream processing systems
Contribution: Usage of an speculation framework based on transactional memory to overlap logging and processing
![Page 3: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/3.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 3 of 55
Motivation (1)
• Event stream applications– Directed acyclic graph of operators– Some operators don't keep state
• Trivially parallelizable
– Some do keep state• Not trivially parallelizable
– Sometimes they are order sensitive• Need to process events sequentially, maybe even waiting for
the order to be restored
![Page 4: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/4.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 4 of 55
Application example
A0B0A2 A1B1B3A4 B2B5B7A5 B6
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 5: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/5.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 5 of 55
Application example
A0B0A2 A1B1B3A4 B2B5B7A5 B6
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
Events based onnon-deterministic
decisionEvents are out!
![Page 6: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/6.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 6 of 55
Application example
B5B7A5 B6 B1B3A4 B2 A0B0A2 A1
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 7: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/7.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 7 of 55
Application example
Restore checkpoint.
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 8: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/8.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 8 of 55
Application exampleAsk upstream node to replay missing ones.
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 9: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/9.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 9 of 55
Application example
Processing some events again.
B5B7A5 B6 B1B2B3 A4
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 10: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/10.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 10 of 55
Application example
B5B7A5 B6 B1B2B3 A4
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
Incomplete log of non-deterministic decisions no repeatability→
Events reflect different decisions.
What are you talking about?
![Page 11: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/11.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 11 of 55
Motivation (2)
• Fault-tolerant event stream applications– Precise recovery– Even if order does not matter, repeatability does– Non-determinism
• Input order from different streams• Non-determinism in processing (multi-threading, time,
random numbers)
– Log or checkpoint before each output
![Page 12: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/12.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 12 of 55
Logging is expensive
![Page 13: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/13.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 13 of 55
My solution
• Speculate...• … to parallelize stateful components• … to not have to wait for events• … to not have to wait for logging
![Page 14: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/14.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 14 of 55
Outline
• How the speculation works
• Logging algorithm
• Experiments
• Final remarks
![Page 15: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/15.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 15 of 55
How the speculation works
• Base: TinySTM– Some extra features added– But same basic rule: “it appears to be atomic”
• Goal: track accesses to shared memory– Instrumentation
• Reads and writes are intercepted• Hold back writes, validate reads until all dependencies
satisfied
![Page 16: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/16.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 16 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 9
912 11 68 7
![Page 17: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/17.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 17 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 9
11
9
68 71214 13
![Page 18: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/18.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 18 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 9
11
9
68 71214 13
![Page 19: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/19.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 19 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 9
11
9
1214 13 68 7
![Page 20: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/20.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 20 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 10
1179 81214 13
![Page 21: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/21.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 21 of 55
Speculative execution: parallelization
Processor 1
Processor 2
NEXT = 10
11
79 81214 13
![Page 22: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/22.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 22 of 55
Logging algorithm
• Operator enqueues all events & decisions
• N+1 threads for N disks– One groups requests in a buffers– The others write their buffers to disk
![Page 23: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/23.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 23 of 55
Logging algorithm
OperatorE
![Page 24: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/24.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 24 of 55
Logging algorithm
Operator
E
![Page 25: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/25.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 25 of 55
Logging algorithm
Operator
![Page 26: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/26.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 26 of 55
Logging algorithm
Operator
NDDs
![Page 27: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/27.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 27 of 55
Logging algorithm
Operator
E
![Page 28: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/28.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 28 of 55
Logging algorithm
Operator
E is here waiting.
![Page 29: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/29.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 29 of 55
Logging algorithm
Operator
![Page 30: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/30.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 30 of 55
Logging algorithm
Operator
![Page 31: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/31.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 31 of 55
Logging algorithm
Operator
update(E)
![Page 32: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/32.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 32 of 55
Logging algorithm
OperatorE
![Page 33: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/33.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 33 of 55
Logging algorithm
A0B0A2 A1B1B3A4 B2B5B7A5 B6
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
![Page 34: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/34.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 34 of 55
Logging algorithm
A0B0A2 A1B1B3A4 B2B5B7A5 B6
STATE
Processor1
Filter nB8
Output Adapter
PublisherB
Filter nA6
PublisherA
STATE
Processor2
Events based onnon-deterministic
decisionEvents are out!
![Page 35: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/35.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 35 of 55
Logging algorithm
B5B7A5 B6
STATE
Processor1
Filter nB8
Filter 1A6
Checkpoint /Logging
![Page 36: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/36.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 36 of 55
Logging algorithm
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
![Page 37: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/37.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 37 of 55
Logging algorithm
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
1
![Page 38: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/38.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 38 of 55
Logging algorithm
2
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
1
![Page 39: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/39.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 39 of 55
Logging algorithm
23
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
1
![Page 40: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/40.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 40 of 55
Logging algorithm
2 43
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
1
![Page 41: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/41.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 41 of 55
Logging algorithm
2 43
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
5
1
![Page 42: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/42.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 42 of 55
Logging algorithm
2 43 6
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
5
1
![Page 43: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/43.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 43 of 55
Logging algorithm
2 43 6
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
5
17
![Page 44: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/44.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 44 of 55
Speculative processing + Logging
• From the original node's viewpoint– Emit outputs as speculative– When logging requests are acknowledged, emit final
• The next downstream node– If speculative event modifies some state, keep track
• Outputs that consider that part of the state are speculative• Speculative status is contagious
![Page 45: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/45.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 45 of 55
Speculation + Logging
2 43 6
STATE
Processor1
Filter n
Filter 1
Checkpoint/Logging
5
13'
7
![Page 46: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/46.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 46 of 55
Experiments
• Parallelization: benefits & STM's overheads
• Optimism control
• Overlapping processing and logging
![Page 47: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/47.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 47 of 55
Speculation costs & speed-ups
Hardware: Sun T1000
Speculation creation-commit-disposal
overheads.
Few shared-memory accesses.
Amdahl's law influence.
Hardware: Sun T1000
Speculation creation-commit-disposal
overheads.
Few shared-memory accesses.
Amdahl's law influence.
![Page 48: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/48.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 48 of 55
Processor 1
Processor 2
NEXT = 9
Controlling optimism
![Page 49: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/49.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 49 of 55
Controlling optimism
Processor 1
Processor 2
NEXT = 9
Processor 3
Processor 4
Processor 5
Processor 6
Processor 7
Processor 8
Processor 1
Processor 2
NEXT = 9
Processor 3
Processor 4
Processor 5
Processor 6
Processor 7
Processor 8
![Page 50: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/50.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 50 of 55
Hardware: SUN T1000
State size varies between 1 and 20.
Size=1: concurrent executions will
conflict.
Size=20: considerable parallelism.
Hardware: SUN T1000
State size varies between 1 and 20.
Size=1: concurrent executions will
conflict.
Size=20: considerable parallelism.
Controlling optimism
![Page 51: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/51.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 51 of 55
2 components do logging.
Non-speculative: only stable events
are sent.
Speculative: send events before
logging is finished.
2 components do logging.
Non-speculative: only stable events
are sent.
Speculative: send events before
logging is finished.
Motivation for distributed speculation: logging costs
![Page 52: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/52.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS
X axis: number of components logging.
Even in a SAN/WAN, the shapes would look
similar.
For deterministic components: add “fixed” latency.
X axis: number of components logging.
Even in a SAN/WAN, the shapes would look
similar.
For deterministic components: add “fixed” latency.
Accumulated gains
![Page 53: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/53.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 53 of 55
Final remarks - Parallelization
• Parallelization through speculation– Easier, less bugs
• Programmer does not need to fight with locks• Keeps sequential semantics
– Waste of resources reduced with optimism control
• Overhead can be much lower with hardware support for TM (e.g., ASF)
![Page 54: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/54.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 54 of 55
Final remarks - Logging
• Overlap logging with processing– Independent of available parallelism– Distributed speculation possible due to less aborts– But do not let speculative results get out of the
system
• In combination with speculative parallelization may even reduce logging
![Page 55: Minimizing Latency in Fault-Tolerant Distributed Stream Processing …cs.brown.edu/courses/cs227/archives/2015/papers/ds... · 2015-01-21 · Minimizing Latency in Fault-Tolerant](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8cfa99c346240a7f474f94/html5/thumbnails/55.jpg)
ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 55 of 55
Thank you!
http://streammine.inf.tu-dresden.dehttp://wwwse.inf.tu-dresden.de