Distinct Elements Problem

15
Ariel Rosenfeld

description

Distinct Elements Problem. Ariel Rosenfeld. Definition. Input : a stream of m integers i1, i2, ..., im. (over 1,…,n) Output : the number of distinct elements in the stream. Example – count the distinct number of IP addresses you encounter. Solutions. - PowerPoint PPT Presentation

Transcript of Distinct Elements Problem

Page 1: Distinct Elements Problem

Ariel Rosenfeld

Page 2: Distinct Elements Problem

Input: a stream of m integers i1, i2, ..., im. (over 1,…,n)

Output: the number of distinct elements in the stream.

Example – count the distinct number of IP addresses you encounter.

Page 3: Distinct Elements Problem

Bit vector of size n (mark 1 when encountered)

Keeping all m integers and naively answer.◦ Sort and count

O(min{n,mlogm})

Page 4: Distinct Elements Problem

a determinitic exact algorithm is impossible using o(n) bits.

A deterministic approximation algorithm for this problem providing a (1 ± 1/1000)-approximation using o(n) bits is impossible.

Page 5: Distinct Elements Problem

2 2Var(X) = E(X ) E(X) . Pick random hash function h :

[n] → [0, 1]

Calculate z = mini stream ∈ h(i)

Output 1/z − 1

Page 6: Distinct Elements Problem

Same ints gets same hash value.

We will show that the output is a good approximation.

Page 7: Distinct Elements Problem

This is idealized for 2 reasons:1.We don’t have perfect precision.2. We need n bits at least to remember the

randomness associated with every i.

Lets ignore it for now…

Page 8: Distinct Elements Problem

S = {j1,…jt} (unique elements in the stream)

h(j1), ..., h(jt) = X1, ..., Xt are independent variables from Unif[0, 1]

Z = min{Xi}

Page 9: Distinct Elements Problem

P=1

0 1

0 1

F(x)

1

1

1

0

1

1

1

1

1111

1

y

t

y

tt

tt

ttt

dyyytdyyyfxE

xtxf

xxFxF

xxF

xf

Page 10: Distinct Elements Problem

1

0

11y

t

y

tt dyxytdyyyfxE

1

1

1

1011

1,1

1,

1

0

11

0

1

0

1

t

t

ydyyyyxE

xtdvdu

xvyu

t

y

ttt

t

t

Page 11: Distinct Elements Problem

1. .

2. .

(HW)

We get a bounded variance.

1

1][

t

ZE

)2)(1(

2]2^[

ttZE

Page 12: Distinct Elements Problem
Page 13: Distinct Elements Problem

q increases -> better approximation

Chebyshev

2^*)()))11((| azVaratz

P

Page 14: Distinct Elements Problem

We want a function that doesn't need n bits or more to represent.

So we will use k-wise independent hash functions (H) each can be represented using a small number of bits (log|H|).◦ In lecture.

Page 15: Distinct Elements Problem

An example - Set q > k a prime power, and define Hpoly,k to be the set of all degree ≤ (k − 1) polynomials in Fq[x].

Hpoly,k is a k-wise independent family.

Size: qk

Needs: k log q bits.