Noria - USENIX...Frontend!4 Stories Votes 90% reads 10% writes JOIN COUNT FILTER Query. Backend...
Transcript of Noria - USENIX...Frontend!4 Stories Votes 90% reads 10% writes JOIN COUNT FILTER Query. Backend...
Dynamic, partially-stateful data-flow forhigh-performance Web applications
Jon Gjengset
Jonathan Behrens Lara Timbó Araújo Martin Ek
Eddie Kohler M. Frans Kaashoek Robert Morris
Malte Schwarzkopf
Noria
!2
Frontend!2
🌎 Frontend!2
Backend
🌎 Frontend!2
Backend
Frontend
!3
Backend
Frontend
!3
Stories Votes
Backend
Frontend
!4
Stories Votes
JOIN
COUNT
FILTERQuery
Backend
Frontend
!4
Stories Votes
JOIN
COUNT
FILTERQuery
Backend
Frontend
!4
Stories Votes
90% reads10% writes
JOIN
COUNT
FILTERQuery
Backend
Frontend
!4
Stories Votes
Slow reads, repeated work!
☹90% reads
10% writesJOIN
COUNT
FILTERQuery
Frontend
!5
Stories Votes
Precomputed results
22
JOIN
COUNT
FILTERQuery
Frontend
!5
Stories Votes
Precomputed results
22
READ
JOIN
COUNT
FILTERQuery
Frontend
!5
Stories Votes
Precomputed results
22
READ
JOIN
COUNT
FILTERQuery
Store in base table?— manual, slow.
Frontend
!5
Stories Votes
Precomputed results
22
READ
JOIN
COUNT
FILTERQuery
Store in base table?— manual, slow.
memcached?— complex [Facebook NSDI’13].
Frontend
!5
Stories Votes
JOIN
COUNT
FILTER
22
Streamingdata-flow?
Store in base table?— manual, slow.
memcached?— complex [Facebook NSDI’13].
Frontend
!6
Stories Votes
JOIN
COUNT
FILTER
22
Streamingdata-flow?
INSERT
Frontend
!6
Stories Votes
JOIN
COUNT
FILTER
22
Streamingdata-flow?
3
Frontend
!6
Stories Votes
JOIN
COUNT
FILTER
Materializedview
22
Streamingdata-flow?
3
Frontend
!6
Stories Votes
JOIN
COUNT
FILTER
Materializedview
22
Fast reads. Efficient writes. Parallelizes well.
Streamingdata-flow?
3
Frontend
!7
Stories VotesChallenges
JOIN
COUNT
FILTER
23
13
2
Frontend
!7
Stories Votes
‣Change queries? Restart!
ChallengesState-of-the-artdata-flow systems:
JOIN
COUNT
FILTER
23
13
2
Frontend
!7
Stories Votes
‣Change queries? Restart!
ChallengesState-of-the-artdata-flow systems:
JOIN
COUNT
FILTERSUM 4
2#
$
42
#
$
23
2
31
Frontend
!7
Stories Votes
‣Change queries? Restart!‣Memory footprint? Grows!
ChallengesState-of-the-artdata-flow systems:
JOIN
COUNT
FILTERSUM 4
2#
$
42
#
$
23
2
31
Frontend
!8
Stories VotesNoria
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
32
Noria
3
21
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
32
‣Change queries? Live.
Noria
3
21
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
32
‣Change queries? Live.
Noria
42
#
$
42
#
$
SUM
3
21
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
32
‣Change queries? Live.‣Memory footprint? Bounded.
Noria
42
#
$
42
#
$
SUM
3
21
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
3
‣Change queries? Live.‣Memory footprint? Bounded.
Noria
42
#
$
42
#
$
SUM
3
Frontend
!8
Stories Votes
JOIN
COUNT
FILTER
3
‣Change queries? Live.‣Memory footprint? Bounded.‣No global coordination.
Noria
42
#
$
42
#
$
SUM
3
New model: Partially-stateful data-flow
!9
!10
Stories Votes
JOIN
COUNT
FILTER
3
13
2
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
!10
Stories Votes
JOIN
COUNT
FILTER
3
13
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
!10
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
!10
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint.
Frontend
!10
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint.No need to update absent entries.
Frontend
!10
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint.No need to update absent entries.Enables live data-flow changes.
Frontend
!11
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow: upqueries
READ
Frontend
!11
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow: upqueries
??? Need to fill absent entry!READ
Frontend
!11
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow: upqueries
??? Need to fill absent entry!READ
Solution: upquery through data-flow.• Compute missing entry from
upstream state
Frontend
Frontend
!12
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow: upqueries
Solution: upquery through data-flow.• Compute missing entry from
upstream state • Response fills missing entry
READ
Frontend
!12
Stories Votes
JOIN
COUNT
FILTER
3
3
2
Partially-stateful data-flow: upqueries
Solution: upquery through data-flow.• Compute missing entry from
upstream state • Response fills missing entry
2
READ
!13
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
!13
#
$
SUM
#
$
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
!13
#
$
SUM
#
$
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
READ #
!13
#
$
SUM
#
$
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
READ #
!13
#
$
SUM
#
$
Start new views and operator state empty, fill via upqueries.
4
4
Partial state enables live data-flow changes
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
READ #
!14
#
$
SUM
#
$
4
4
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
High performance requires concurrency
!14
#
$
SUM
#
$
4
4
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
High performance requires concurrency
Process operators concurrently.Read from views concurrently.Process shards concurrently.
Without global coordination!
!14
#
$
SUM
#
$
4
4
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
High performance requires concurrency
Process operators concurrently.Read from views concurrently.Process shards concurrently.
Without global coordination!
!14
#
$
SUM
#
$
4
4
Stories Votes
JOIN
COUNT
FILTER
32
3
21
Frontend
High performance requires concurrency
Process operators concurrently.Read from views concurrently.Process shards concurrently.
Without global coordination!
Challenges implementing partially-stateful data-flow
!15
Challenges implementing partially-stateful data-flow
!15
1. Concurrent upqueries and forward processing — races!
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
!15
1. Concurrent upqueries and forward processing — races!
Must maintain correctness under concurrency!
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
COUNT2
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
COUNT2
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
COUNT2
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
COUNT2
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
Upquery response is a snapshot of state
COUNT2
2
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
Upquery response is a snapshot of state
COUNT2
2
includes 12
does not include33
Correctness under concurrency
!16
Goal: upquery restores state as if present all along.
12
Upquery response is a snapshot of state
COUNT2
2
includes 12
does not include
Solution: Maintain order of upquery response and surroundingupdates, despite lack of global coordination.
33
Upquery responses in total order with updates
!17
Goal: upquery restores state as if present all along.
Upquery responses in total order with updates
!17
Goal: upquery restores state as if present all along.
223
3
1
resulting staterespects total order
Upquery responses in total order with updates
!17
Goal: upquery restores state as if present all along.
223
3
1
resulting staterespects total order
232
2
1
resulting stateviolates total order
Upquery responses in total order with updates
!17
Goal: upquery restores state as if present all along.
223
3
1
resulting staterespects total order
232
2
1
resulting stateviolates total order
More complex cases: merged upquery responses, evictions (Paper).
Challenges implementing partially-stateful data-flow
!18
1. Concurrent upqueries and forward processing — races!
2. Update processing may require absent state
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
!18
1. Concurrent upqueries and forward processing — races!
2. Update processing may require absent state
COUNT
3
2
…
absent
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
!18
1. Concurrent upqueries and forward processing — races!
2. Update processing may require absent state
COUNT
3
2
…
absent
Must maintain correctness under concurrency!
Drop updates that touch absent state, future upquery repeats them.
Challenges implementing partially-stateful data-flow
!18
1. Concurrent upqueries and forward processing — races!
2. Update processing may require absent state
COUNT
3
2
…
absent
Must maintain correctness under concurrency!
Drop updates that touch absent state, future upquery repeats them.
(see Paper)
!19
Noria implementation
!19
Noria implementation
!19
Noria implementation
MySQL adapter
!19
Noria implementation
Data-flow graph
MySQL adapter
Transform
!19
Noria implementation
Data-flow graph
MySQL adapter
Transform
!19
Noria implementation
Data-flow graph
MySQL adapter
• 45k lines of Rust + 15k libraries• RocksDB for base table storage• ZooKeeper for leader election
Transform
1. Can Noria improve a real web application’s performance?
2. How does Noria compare to alternative approaches?
3. Can Noria change queries without downtime?
!20
Evaluation
1. Can Noria improve a real web application’s performance?
2. How does Noria compare to alternative approaches?
3. Can Noria change queries without downtime?
!20
Evaluation
Amazon EC2 c5.4xlarge instance (16 vCPUs)Open-loop clients, measuring latency & throughput
Setup
1. Can Noria improve a real web application’s performance?
2. How does Noria compare to alternative approaches?
3. Can Noria change queries without downtime?
!20
Evaluation
Amazon EC2 c5.4xlarge instance (16 vCPUs)Open-loop clients, measuring latency & throughput
Setup
multi-machine experimentscomparison with differential dataflow } see Paper
!21
Case study: Lobsters (http://lobste.rs)
‣Ruby-on-Rails applicationwith MySQL backend
!21
Case study: Lobsters (http://lobste.rs)
‣Ruby-on-Rails applicationwith MySQL backend‣Hand-optimized by
developers to pre-compute aggregations
!21
Case study: Lobsters (http://lobste.rs)
‣Ruby-on-Rails applicationwith MySQL backend‣Hand-optimized by
developers to pre-compute aggregations‣Noria data-flow with
235 operators, 35 views
!21
Case study: Lobsters (http://lobste.rs)
‣Ruby-on-Rails applicationwith MySQL backend‣Hand-optimized by
developers to pre-compute aggregations‣Noria data-flow with
235 operators, 35 views‣Emulate production load
!22
Can Noria improve Lobsters’ performance?
!22
Can Noria improve Lobsters’ performance?Be
tter
Better
!22
Can Noria improve Lobsters’ performance?Be
tter
Better
!22
Can Noria improve Lobsters’ performance?Be
tter
Better
!22
Noria with natural queries supports 5x MySQL’s throughput.
Can Noria improve Lobsters’ performance?Be
tter
Better
!23
How does Noria compare to alternatives?Be
tter
Better
!23
How does Noria compare to alternatives?
‣Zipf-distributed story ID,95% reads, 5% writes ‣No TX, all in-memory
Bette
r
Better
!23
How does Noria compare to alternatives?
‣Zipf-distributed story ID,95% reads, 5% writes ‣No TX, all in-memory
Bette
r
Better
!24
How does Noria compare to alternatives?Be
tter
Better
‣Zipf-distributed story ID,95% reads, 5% writes ‣No TX, all in-memory
!24
How does Noria compare to alternatives?Be
tter
Better
‣Zipf-distributed story ID,95% reads, 5% writes ‣No TX, all in-memory
!24
How does Noria compare to alternatives?
Noria outperforms an in-memory key-value store and simplifies its interface.
Bette
r
Better
‣Zipf-distributed story ID,95% reads, 5% writes ‣No TX, all in-memory
32
!25
Can Noria change queries without downtime?
JOIN
COUNT
FILTER
StoriesVotes
32
!25
Can Noria change queries without downtime?
JOIN
COUNT
FILTER
StoriesVotes
3 ⭐
1.5 ⭐
AVG
JOIN
COUNT
Ratings⭐⭐⭐
⭐
⭐⭐
!26
Can Noria change queries without downtime?Be
tter
!26
Can Noria change queries without downtime?
new table & query added
Bette
r
!26
Can Noria change queries without downtime?
new table & query added
Bette
r
!26
Can Noria change queries without downtime?
new table & query added
Bette
r
‣Zipf-distributed story ID, 95% reads; 2M existing votes at transition
!27
Can Noria change queries without downtime?Be
tter
‣Zipf-distributed story ID, 95% reads; 2M existing votes at transition‣Old view reads are live throughout
!27
Can Noria change queries without downtime?
instantaneous transition,no downtime for writes
Bette
r
‣Zipf-distributed story ID, 95% reads; 2M existing votes at transition‣Old view reads are live throughout
!27
Can Noria change queries without downtime?
instantaneous transition,no downtime for writes
80% of reads from new view proceedwithout upquery after 1 second
Bette
r
‣Zipf-distributed story ID, 95% reads; 2M existing votes at transition‣Old view reads are live throughout
!27
Can Noria change queries without downtime?
instantaneous transition,no downtime for writes
80% of reads from new view proceedwithout upquery after 1 second
Noria achieves downtime-free query change with partial state.
Bette
r
‣Zipf-distributed story ID, 95% reads; 2M existing votes at transition‣Old view reads are live throughout
• New partially-stateful data-flow model.
• Noria: new web application backend based on data-flow.
• Partial state saves space and allows live change.
• Supports high throughput on one or more machines.
• Open source, try it out!
!28
Noria — Summary
• New partially-stateful data-flow model.
• Noria: new web application backend based on data-flow.
• Partial state saves space and allows live change.
• Supports high throughput on one or more machines.
• Open source, try it out!
!28
https://pdos.csail.mit.edu/noria
Noria — Summary
(see our demo at poster #37 today!)