Size-estimation framework with applications to transitive closure and reachability Presented by...

33
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996

Transcript of Size-estimation framework with applications to transitive closure and reachability Presented by...

Page 1: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Size-estimation framework with applications to transitive closure and reachability

Presented by Maxim Kalaev

Edith Cohen

AT&T Bell Labs

1996

Page 2: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Agenda

Intro & Motivation Algorithm sketch The estimation framework Estimating reachability Estimating neighborhood sizes

Page 3: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Introduction

o Descendant counting problem:“Given a directed graph G compute for each node number of nodes reachable from it and the total size of the transitive closure”

Page 4: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Introduction

- set of nodes reachable from node

Transitive closure size:

Example:|S(‘A’)|=5, |S(‘B’)|=3

T=|S(‘A’)|+|S(‘B’)|+…= 15

A

D

C

B

E

Vv

vST )(

)(vSv

Page 5: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Motivation

Applicable for DB-query size estimations

Data mining Matrixes multiplications

optimizations Parallel DFS algorithms

optimizations

Page 6: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Framework algorithm sketch

Least descendant mappingGiven graph G(V,E) with ranks on it’s nodes compute a mapping for each node v in V to the least-ranked node in S(v)

A4

D2

C5

B1

E3

Example:

•LE(‘A’) = 1

•LE(‘C’) = 2

Page 7: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Framework algorithm sketch

The LE (least element) is highly correlated with size of S(v) !!

The precision can be improved by applying several iterations with random ranks assignment and recalculation of LE

Page 8: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Let X be a set of elements x with non-negative weights w(x).

Let Y be a set of labels y, and mapping S: from labels y to subsets of x

Our object is to compute an estimate on:

- assuming X,Y and weights are given but it’s costly to calculate w(S(y)) for all y’s

xY 2

YyxwySwySx

)(

)())((

Page 9: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Assume we have the following LE (LeastElement) Oracle: given ranks R(x) on elements of X, LE(y) returns element with minimal rank in S(y) in O(1) time:

The estimation algorithm will perform k iterations, where k is determined by required precision

)(min))(( )( xRyleR ySx

Page 10: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Iteration: Independently, for each x in X select a

random rank R(x) from exponential distribution with parameter w(x)

Exponential distribution function will be:

Apply LE on selected ranking and store obtained min-ranks for each y in Y

)0(1)( )( tetF txwx

Page 11: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Proposition: The distribution of minimum rank R(le(y)) depends only on w(S(y))

Proof: The min of k r.v.’s with distribution with parameters has distribution with parameter

Our objective now is to estimate distribution parameter from given samples

kww ,...,1

k

j jw1

Page 12: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Mean of exponentially distributed with parameter λ r.e.’s is: 1/λ

We can use this fact to estimate λ from samples by 1/(samples mean)

Use this to estimate w(S(y)) from minimal ranks obtained from k iterations:

k

ii

k

ii yleR

k

k

yleRySw

11

))(())((

1))((~

Page 13: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

More estimators: Selecting k(1-1/e) –smallest sample of k

samples. (Like median for uniform distribution)

Using this non-intuitive average estimator:

k

ii yleR

kySw

1

))((

1))((~

Page 14: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

The estimation framework

Complexity so far: Allowing relative tolerated error ε we need to

store significant bits for R’s k assignment iterations will take O(k|X|) time + k*O(Oracle setup time)

Asymptotic accuracy bounds (the proof will go later)

1log

)/1()))((

))((~))(((

))(exp())}(())((~))(({ 2

kOySw

ySwySwE

kySwySwySwP

Yy

Page 15: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating reachability

Objective: Given graph G(V,E) for each v estimate number of its descendantsand size of transitive closure:

All we need is to implement an Oracle for calculating LE mapping.Following algorithm inputs arbitrary ranking of nodes in sorted order and does this in O(|E|) time:

|)(~| vs

Vv

vSTforT |)(|~

Page 16: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating reachability

LE subroutine() Reverse edges direction of the graph Iterate until V = {}

Pop v with minimal rank from V Run DFS to find all nodes reachable from v

(call this set of nodes U) For each node in U set LE == v V = V \ U E = E \ {edges incident to nodes in U}

Page 17: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating reachability

Each estimation iteration takes O(|V|) + O(|E|) assuming we can sort nodes ranks in expected linear time.

Accuracy bounds (from estimator bounds)

)/1())(

)(~

)((

))(exp()()(~

)({ 2

kOyS

ySySE

kySySySP

Vv

Page 18: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

Problem: Given graph G(V,E) with nonnegative edges lengths should be able to give an estimation for number of nodes within distance of at most d from node v – n(v,d)

Our algorithm will preprocess G in time and after that will be able to answer (v,d) queries in time

|))(|log|||)log(||(| 2 VVVEO

|))log(|(log VO

Page 19: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

N(A,7)={A,B,C,D,E} N(A,3)={A,C,E} N(D,0)={D} N(C,∞)={C}

n(A,7)=5 n(A,3)=3 n(D,0)=1 n(C,∞)=1

A4

D2

C5

B1

E31

2

4

3

1

1

Page 20: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

After preprocessing of G we will generate for each node v a list of pairs: ({d1,s1},

{d2,s2},…,{dη,sη}), where d’s stays for distances and s’s stays for estimated neighborhoods sizes. The lists will be sorted by d’s.

To obtain n(v,d) we’ll look for a pair i such that and return 1 ii ddd

is

Page 21: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

The algorithm will run k iterations, in each iteration it will create for each node in G a least-element list ({d1,v1}, {d2,v2},…,{dη,vη}) such that for any neighborhood (v,d) we will be able to find a min-rank node using the list: for min-rank node will be:

1 ii ddd

iv

Page 22: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

Neighborhoods: N(A,7)={A,B,C,D,E} N(A,3)={A,C,E} N(D,1)={C,D} N(C,∞)={C}

LE-lists: A: ({A,0}{E,1}{D,2}{B,4}) B: ({B,0}) C: ({C,0}) D: ({D,0}) E: ({E,0}{D,3})

A4

D2

C5

B1

E31

2

4

3

1

1

Page 23: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - alg

sub Make_le_lists() Assume nodes are sorted by rank

in increasing order Reverse edge direction of G For i=1..n: , For i=1..n (modified Dijkstra’s alg.) DO:

(next slide)

id )( listemptyvi

nvv ..1

Page 24: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - alg

I. Start with empty heap, place on heap with label 0

II. Iterate until the heap is empty: Pop node vk with minimal label d from

the heap Add pair to vk’s LE-list,

set For each out-edge of vk:

If is in the heap – update its label to

Else: if place on the heap with label

iv

))(,( jj eDddMIN jj deDd )(

)( jeDd

jv

jv

),( ivd

),( ij vve ddk

Page 25: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

B1∞

Estimating neighborhood sizes - demo

A4

D2

C5

E31

2

4

3

1

1

B:0

A A:0 E:1 D:2 B:4

B B:0

C C:0

D D:0

E E:0 D:3

A:4D:0A:2

E:3

E:0A:0C:0A:1

∞4

0

21

0

03

0

0

Page 26: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - analysis

CorrectnessProposition 1: A node v is placed on heap in iteration i

if an only if If v is placed on the heap in iteration i,

then the pair is placed on v’s list and the value d is updated to be

ijvvdistvvdist ji ),(),(

}),,({ vvvdist i

),( vvdist i

Page 27: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - analysis

ComplexityProposition 2: If the ranking is a random permutation,

the expected size of LE-lists is O(log(|V|)

The proof is based on proposition 1 and divide&conquer style analysis -

Page 28: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - analysis

(proof cont) Assume LE-list of node u contains x pairs. Consider

nodes v sorted by their distance to node u: v1,v2,….According to preposition 1 node v will enter heap at iteration i iff all the nodes with lower ranks are farer from u than is. Random ranks are expected to partition v1,v2,… sequence such that rank i will be nearer to u than about half of nodes with ranks > i.It follows that x is ~ O( log|V| )

iv

Page 29: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes - analysis

Complexity (cont)Running time: Using Fibonacci heaps we have O(log|V|) pop() operation and O(1) insert() or update(). Let be a number of iterations in which was placed on the heap (0<i≤|V|). It follows that running time is:

As is also a size of LE-list we get:

il

iv

||1

))(outdeg||(log(Vi

ii vVlO

il s'iv|)|log||||log|(| 2 VEVVO

Page 30: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Estimating neighborhood sizes

K – iterations issues What to do with obtained k LE-lists per

node? Naïve way brings us to O(k*loglog|V|) time.It can be improved to O(logk + loglog|V|) by merging the lists and storing sums of ranks / breakpoint.

Total algorithm setup time is:|))|log||||log|(|( 2 VEVVkO

Page 31: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

This page has intentionally left blank

Page 32: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Summary

General size-estimation framework Two applications – transitive closure

size estimation and neighborhoods size estimation

Page 33: Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

A4

D2

C5

B1

E31

2

4

3

1

1

THE END!