IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
-
Upload
spark-summit -
Category
Data & Analytics
-
view
3.032 -
download
0
Transcript of IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
![Page 1: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/1.jpg)
UC BERKELEY
IndexedRDD
Ankur Dave���UC Berkeley AMPLab
Efficient Fine-Grained Updates for RDDs
![Page 2: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/2.jpg)
Immutability in Spark
Query: Sequence of data-parallel bulk transformations on RDDs
![Page 3: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/3.jpg)
Immutability in Spark
map task 4
map task 3
map task 2
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
Wor
ker I
D
Time
Transformations are executed as independent tasks
![Page 4: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/4.jpg)
Immutability in Spark
map task 4
map task 3
map task 2
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
Wor
ker I
D
Time
map task 2 (retry)
Automatic mid-query fault tolerance
![Page 5: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/5.jpg)
Immutability in Spark
map task 4
map task 3
map task 2 (killed)
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
Wor
ker I
D
Time
map task 2 (replica)
Automatic straggler mitigation
Task replayability: Repeated task executions must give the same result
Enforced by input immutability and task determinism
![Page 6: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/6.jpg)
The Need for Efficient Updates
1. Local mutable hash table• Partial task failures corrupt local state
2. Local or external database with atomic updates and snapshots (LevelDB, Cassandra)
• Severely limits program generality3. Copy entire data structure on update• Highly wasteful for small updates
Naïve SolutionsMotivation• Spark’s core abstraction is bulk
transformations of immutable datasets (RDDs)
• For batch analytics, immutability enables automatic mid-query fault tolerance and straggler mitigation
• But applications like streaming aggregation depend on efficient fine-grained updates
• Naive solutions involving mutable state sacrifice the advantages of Spark
• Instead, we propose to enable efficient updates without sacrificing immutable semantics
Efficient Updates to Immutable DataAnkur Dave, Joseph Gonzalez, Ion Stoica
username # followers@taylorswift13 48,182,689@ankurdave 206@mejoeyg 81
@taylorswift13: +1 follower
@taylorswift13: +1 follower
Purely Functional Data Structures• Enable efficient updates without
modifying the existing version in any way
• Immutable semantics: Updates return a new copy of the data that internally shares structure with existing copy (similar to copy-on-write snapshot)
• Nodes from old versions are GC’d• Different data structures optimize
different operations (joins, range scans)
Challenges1. Garbage collection:• Tunable branching factor,!
reference counting2. Efficient checkpointing• Incremental checkpoints using data
structure diff3. Aging data out to disk
Evaluation
0.01
0.1
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000 1e+06 1e+07
Tota
l tim
e (m
s)
Read/write size (records)
Read-modify-write scaling, 10M existing elements
hash table with cloningCoW B-tree
LevelDB with snapshotspatricia trie
red-black treehash table (mutable baseline)
Related Work1. Multiversion Concurrency Control2. Copy-on-write B-trees3. Stratified Doubling Array
Streaming Aggregation
Incremental AlgorithmsOld Ranks New RanksGraph Update
+ =
![Page 7: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/7.jpg)
Existing Solutions
map task
Direct mutation:
Problem: Task failures corrupt state
map task (retry)
Time
x = 0, y = 0x = 1, y = 0x = 2, y = 1
map task
def f(row: Row) = { row.x++; row.y++; }
![Page 8: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/8.jpg)
Existing SolutionsAtomic batched database updates:
Problem: Long-lived snapshots are uncommon in databases
Time
Pages taskAds task
def f(row: Row) = { BEGIN; row.x++; row.y++; COMMIT; }
Clicks Ad Views Top Ads
Popular Pages
![Page 9: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/9.jpg)
Existing SolutionsFull copy:
���Problem: Very inefficient for small updates
def f(row: Row) = { val newRow = row.clone(); newRow.x++; newRow.y++; return newRow; }
![Page 10: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/10.jpg)
Immutability(for fault tolerance, straggler mitigation, dataset reusability, parallel recovery, …)
Tradeoff
Fine-grained updates(for streaming aggregation, incremental algorithms, …)
vs.
Can we have both?
![Page 11: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/11.jpg)
Support efficient updates without modifying existing version using structural sharing (fine-grained copy-on-write)
Persistent Data Structures
![Page 12: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/12.jpg)
RDD-based distributed key-value store supporting immutable semantics and efficient updates
IndexedRDD
IndexedRDD (800 LOC)
Spark
PART (1100 LOC)
![Page 13: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/13.jpg)
IndexedRDD
RDD[T] (cached)
Array[T]
Array[T]
Array[T]
Array[T]
IndexedRDD[K,V]
PART[K,V]
PART[K,V]
PART[K,V]
PART[K,V]
![Page 14: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/14.jpg)
class IndexedRDD[K: KeySerializer, V] extends RDD[(K, V)] { // Fine-‐Grained Ops -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ def get(k: K): Option[V] def multiget(ks: Array[K]): Map[K, V] def put(k: K, v: V): IndexedRDD[K, V] def multiput(kvs: Map[K, V], merge: (K,V,V) => V): IndexedRDD[K, V] def delete(ks: Array[K]): IndexedRDD[K, V] // Accelerated RDD Ops -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ def filter(pred: (K,V) => Boolean): IndexedRDD[K, V] // Specialized Joins -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ def fullOuterJoin[V2, W](other: RDD[(K, V2)])(f) : IndexedRDD[K, W]
}
IndexedRDD API
![Page 15: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/15.jpg)
Persistent Adaptive Radix Trees (PART)
![Page 16: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/16.jpg)
Recent work in main-memory indexing for databases
256-ary radix tree (trie) with node compression
V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE 2013.
Adaptive Radix Tree
![Page 17: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/17.jpg)
1. Sorted order traversals (unlike hash tables)
2. Better asymptotic performance than binary search trees for long keys (O(k) vs O(k log n))
3. Very efficient union and intersection operations
4. Predictable performance: no rehashing or rebalancing
Why a Radix Tree?
![Page 18: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/18.jpg)
Adds persistence to the adaptive radix tree using path copying (shadowing) and reference counting
1100 lines of Java
Persistent Adaptive Radix Tree (PART)
old new
…
…
![Page 19: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/19.jpg)
Updates can be performed in place if:1. Update applies to nodes referenced only by one
version, and
2. That version will never be referenced in the future
Batched Updates
×
v1 v2
v1 v3 v1 v3
~
t1
t2
v2
![Page 20: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/20.jpg)
Microbenchmarks
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
PART (batch size
1K)
PART (batch size
10M)
hash table red-black tree
B-tree
M in
sert
s/s
Insert throughput (single threaded)
![Page 21: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/21.jpg)
Microbenchmarks
0
0.5
1
1.5
2
2.5
M lo
okup
s/s
Lookup throughput
0
5
10
15
20
25
M e
lem
ents
sca
nned
/s
Scan throughput
0
0.2
0.4
0.6
0.8
1
1.2
Mem
ory
usag
e (G
B)
Memory usage
![Page 22: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/22.jpg)
Tree partitioning: minimize number of changed files by segregating frequently-changed nodes from infrequently-changed nodes.
Incremental Checkpointing
frequentlychanging
infrequentlychanging
![Page 23: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/23.jpg)
Incremental Checkpointing
![Page 24: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/24.jpg)
Counting occurrences of 26-character string ids
Synthetic data generated on the slaves (8 r3.2xlarge)
Load with1 billion keys, stream of 100 million keys
Real app: Streaming aggregation
![Page 25: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/25.jpg)
1. GC pauses���Future work: Off-heap storage w/ reference counting
2. Scan performance���Future work: Layout-aware allocator
Limitations
![Page 26: IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)](https://reader031.fdocuments.us/reader031/viewer/2022030310/58f9a98d760da3da068b6fda/html5/thumbnails/26.jpg)
Thanks!
IndexedRDD: https://github.com/amplab/spark-indexedrdd
Use it in your project:
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-‐packages/maven" libraryDependencies += "amplab" % "spark-‐indexedrdd" % "0.1"