Monoids and sketches and crdts, oh my!
-
Upload
kscaldef -
Category
Technology
-
view
74 -
download
1
Transcript of Monoids and sketches and crdts, oh my!
Monoids and Sketches and CRDTs, oh my!
Kevin ScaldeferriOSB 2016
How Do I Math with Big Data?
This document and the information herein (including any information that may be incorporated by reference) is provided for informational purposes only and should not be construed as an offer, commitment, promise or obligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code, functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not be replicated or disclosed without New Relic’s express written permission.
Such information may contain forward-looking statements within the meaning of federal securities laws. Any statement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates, goals, or other characterizations of future events is a forward-looking statement. These forward-looking statements can often be identified as such because the context of the statement will include words such as “believes,” “anticipates,” “expects” or words of similar import.
Actual results may differ materially from those expressed in these forward-looking statements, which speak only as of the date hereof, and are subject to change at any time without notice. Existing and prospective investors, customers and other third parties transacting business with New Relic are cautioned not to place undue reliance on this forward-looking information. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect such forward-looking statements is included in the filings we make with the SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at ir.newrelic.com or the SEC’s website at www.sec.gov.
New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, with respect to the information provided.
How?
Monoids and Sketches and CRDTs, oh my!
WikipediaA monoid is an algebraic structure with a single
associative binary operation and an identity element.
http://bit.ly/1Wlrigv / CC0
It’s just a thing you can “add”
interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);
// 0 + x = x = x + 0 T unit();}
interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);
// 0 + x = x = x + 0 T unit();}
interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);
// 0 + x = x = x + 0 T unit();}
interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);
// 0 + x = x = x + 0 T unit();}
interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);
// 0 + x = x = x + 0 T unit();}
One data type can have multiple monoids!
Operation Unit
Sum 0
Product 1
Max -∞
Min +∞
Live Demo!
More Monoids
Count Boolean And
Lists & StringConcatenation
Boolean Or
Set UnionFunction
Composition
Tuple Monoids
Monoid[U] & Monoid[V]
➜
Monoid[(U,V)]
Derived Monoids
Count & Sum ➜ Average
Count & Sum & SumOfSquares ➜ StdDev
Sets don’t scale
Dan Morgan / http://bit.ly/1UiFhGs / CC BY 2.0
Sketches=
Monoids +
Physics
Counting by Flipping Coins
HHT T T HHHHHT HT T HHT HT T T
T T T HT T T T T T HT
Unique Count by Hashing0111101001 1110101100 0010010010 0100100011 1000111000 0100011011 1100100110 1111011011 0011100001 1001011100
1110100101 1001110101 1010111001 1011110111 0000101001 0100101001 0100110000 0011110100 1011011010 0010011011
Set Cardinality
(uniqueCount)≈
HyperLogLogAldo Schumann / http://bit.ly/1Yqzvme / public domain
Set Membership
interface ExtensionalSet[T] { Iterator[T] iterator()}
interface IntensionalSet[T] { boolean isMember(T t);}
Intensional Sets≈
Bloom Filters
HashSet
AHashSet
AHashSet
A
HashSet
A
BHashSet
A
BHashSet
A B
HashSet
A B
CHashSet
A B
CHashSet
A B
C
Ohnoes!
HashSet
A B
C
HashSet
A B
C
D?HashSet
A B
C
D?HashSet
A B
C
D?
Nopes!
HashSet
A B
C
E?HashSet
A B
C
E?HashSet
A B
C
E?
Hmmm
HashSet
A B
C
E?==
HashSet
A B
C
E?==Nope!
HashSet
BloomFilter
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ABloomFilter
0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0
ABloomFilter
0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0
A BBloomFilter
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
A B CBloomFilter
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
A B C
D?
BloomFilter
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
A B C
D?Nope!
BloomFilter
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
A B C
A?
BloomFilter
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
A B C
A?Yes*
BloomFilter
BloomFilter Monoid
0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0
0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 1
0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 1
+
=
Circling Back:BloomFilters are a scalable
approximation to Sets
CountMinSketch
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CountMinSketch
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CountMinSketch
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
CountMinSketch
10 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
BCountMinSketch
0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0
0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0
B CCountMinSketch
0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0
0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
B CCountMinSketch
0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0
0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
B C
D?
CountMinSketch
0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0
0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
B C
D? Min(2,1,0) = 0
CountMinSketch
0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0
0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
B C
A?
CountMinSketch
0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
A
0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0
0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
B C
A? Min(2,2,3) = 2
CountMinSketch
CountMinSketchFrequency of Occurrence
Funnels% of users who do A, then B
Size(A ∪ B) ≈ HyperLogLog
Size(A ∩ B) / Size(A ∪ B) ≈
MinHash
pedrik / http://bit.ly/25WzP1H / CC BY 2.0
What About Streaming Data?
Streaming is Distributed-in-Time
Computation
What About Mutable Data?
CRDTs
Conflict-Free
Replicated
Data
Types
Available,Eventually Consistent
Data Structures
How Can Two People Count?
0
0
Shared Counter
0
0
Shared Counter
(+5)5
5
0
0
Shared Counter
(+5)5
5
(-4)
(-3)
1 -2
2 -2
0
0
Op-based Counter
(+5)5
5
(-4)
(-3)
1 -2
2 -2
0
0
Op-based Counter
(+5)5
5 10
Oops!
{}
{}
Naive Sets
{}
{}
Naive Sets
(+X){X}
(+X)
{X}
{X} {X}
{}
{}
Naive Sets
(+X){X}
(+X)
{X}
{X} {X}
(-X){}
{}
{}
{}
Naive Sets
(+X){X}
(+X)
{X}
{X} {X}
(-X){}
{}
Oops!
{}
{}
Observed-Remove Sets
(+Xa){Xa}
(+Xb)
{Xb}
{Xb} {XaXb}
(-Xa){}
{Xb}
0
0
State-based Counter
0
0
State-based Counter
(+5){a=5}=5
{a=5}=5
0
0
{a=9}=9
State-based Counter
(+5) (+4)
(+3)
{a=5}=5
{a=5}=5 {a=5,b=3}=8 {a=9,b=3}=12
{a=9,b=3}=12
0
0
{a=9}=9
State-based Counter
(+5) (+4){a=5}=5
???{a=9}=9
0
0
Increment-only Counter
(+5) (+4){a=5}=5
{a=9}=9{a=9}=9
{a=9}=9
0
0 {a=+5,-4}=1
{a=+5,-4}=1
PN Counter
(+5) (-4){a=+5}=5
{a=+8,-4}=4{a=+5,-4}=1
(+3){a=+8,-4}=4
0
0 {a:2:1}=1
{a:2:1}=1
Versioned State
(+5) (-4){a:1:5}=5
{a:3:4}=4{a:2:1}=1
(+3){a:3:4}=4
Replace exactly-once, in-order delivery
with an idempotent merge strategy
Summing UpMonoids allow computations to be done across many machines and merged
Sketches allow approximate results when the exact answers are computationally infeasible
CRDTs give an approach for mutable distributed data
Thank [email protected]@kscaldef