Topics in TCS

92
Topics in TCS Majority and Misra-Gries Rapha¨ el Clifford

Transcript of Topics in TCS

Page 1: Topics in TCS

Topics in TCS

Majority and Misra-Gries

Raphael Clifford

Page 2: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 3: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 4: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 5: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 6: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 7: Topics in TCS

The majority problem

Given a sequence of integers a1, . . . , am, does there exist a integerthat occurs more than m/2 times?

Originally considered for elections. Three candidates A,B and C . Didany of them get a majority?

Naive majority solution

Votes: AAACCBBCCCBCC . There were 13 votes in total.

We could sort the input giving AAABBBCCCCCCC .

Then traverse in linear time to find if any occur ≥ 7 times.

Linear space and O(m logm) time.

Page 8: Topics in TCS

Finding the majority

Solved in 1981 by Boyer and Moore when considering votes. RunMajority for each item in the input.

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

Page 9: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

¯AAACCBBCCCBCC

α = Ac = 1

C is the majority itemX

Page 10: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

A¯AACCBBCCCBCC

α = Ac = 2

C is the majority itemX

Page 11: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AA¯ACCBBCCCBCC

α = Ac = 3

C is the majority itemX

Page 12: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAA¯CCBBCCCBCC

α = Ac = 2

C is the majority itemX

Page 13: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAAC¯CBBCCCBCC

α = Ac = 1

C is the majority itemX

Page 14: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACC¯BBCCCBCC

α = Ac = 0

C is the majority itemX

Page 15: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCB¯BCCCBCC

α = Bc = 1

C is the majority itemX

Page 16: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBB¯CCCBCC

α = Bc = 0

C is the majority itemX

Page 17: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBBC¯CCBCC

α = Cc = 1

C is the majority itemX

Page 18: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBBCC¯CBCC

α = Cc = 2

C is the majority itemX

Page 19: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBBCCC¯BCC

α = Cc = 1

C is the majority itemX

Page 20: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBBCCCB¯CC

α = Cc = 2

C is the majority itemX

Page 21: Topics in TCS

Majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAACCBBCCCBC¯C

α = Cc = 3

C is the majority itemX

Page 22: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

¯AAABBBC

α = Ac = 1

Page 23: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

A¯AABBBC

α = Ac = 2

Page 24: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AA¯ABBBC

α = Ac = 3

Page 25: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAA¯BBBC

α = Ac = 2

Page 26: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAAB¯BBC

α = Ac = 1

Page 27: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAABB¯BC

α = Ac = 0

Page 28: Topics in TCS

Majority algorithm - wrong answer

Majority(j)

initialise item αinitialise counter c = 0Repeat for each j

if c == 0α = jc = 1

elif j == αc = c + 1

elsec = c - 1

Consider the stream arriving fromleft to right.

AAABBB¯C

α = Cc = 1

Page 29: Topics in TCS

Time and space of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

Running time: At most 2mcomparisons

O(m) overall.

Space usage: One item and oneinteger

O(log n + logm) bits overall

Page 30: Topics in TCS

Time and space of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

Running time: At most 2mcomparisons

O(m) overall.

Space usage: One item and oneinteger

O(log n + logm) bits overall

Page 31: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 32: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 33: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 34: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 35: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 36: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 37: Topics in TCS

Correctness of the majority algorithm

Majority(j)

initialise item αinitialise counter c = 0Repeat for each jif c == 0

α = jc = 1

elif j == αc = c + 1

else

c = c - 1

If there is a majority item, it is re-ported.

Let α∗ be the final value of α

Run through input from left to right.

Let c ′ = c if α = α∗ and −cotherwise.

Every occurrence of α∗ increases c ′ byone.

Every occurrence that is not α∗ either increases or decreases c ′ by 1.

If α∗ is in the majority, there are more increases than decreases andso c ′ = c which implies α = α∗.

Page 38: Topics in TCS

Correctness of the majority algorithm

¯AAACCBBCCCBCC

α = A

c = 1

c ′ = −1

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 39: Topics in TCS

Correctness of the majority algorithm

A¯AACCBBCCCBCC

α = A

c = 2

c ′ = −2

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 40: Topics in TCS

Correctness of the majority algorithm

AA¯ACCBBCCCBCC

α = A

c = 3

c ′ = −3

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 41: Topics in TCS

Correctness of the majority algorithm

AAA¯CCBBCCCBCC

α = A

c = 2

c ′ = −2

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 42: Topics in TCS

Correctness of the majority algorithm

AAAC¯CBBCCCBCC

α = A

c = 1

c ′ = −1

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 43: Topics in TCS

Correctness of the majority algorithm

AAACC¯BBCCCBCC

α = A

c = 0

c ′ = 0

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 44: Topics in TCS

Correctness of the majority algorithm

AAACCB¯BCCCBCC

α = B

c = 1

c ′ = −1

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 45: Topics in TCS

Correctness of the majority algorithm

AAACCBB¯CCCBCC

α = B

c = 0

c ′ = 0

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 46: Topics in TCS

Correctness of the majority algorithm

AAACCBBC¯CCBCC

α = C

c = 1

c ′ = 1

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 47: Topics in TCS

Correctness of the majority algorithm

AAACCBBCC¯CBCC

α = C

c = 2

c ′ = 2

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 48: Topics in TCS

Correctness of the majority algorithm

AAACCBBCCC¯BCC

α = C

c = 1

c ′ = 1

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 49: Topics in TCS

Correctness of the majority algorithm

AAACCBBCCCB¯CC

α = C

c = 2

c ′ = 2

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 50: Topics in TCS

Correctness of the majority algorithm

AAACCBBCCCBC¯C

α = C

c = 3

c ′ = 3

If there is a majority item, it is reported.

A,A,A,C ,C ,B,B,C ,C ,C ,B,C ,CAt termination c = 3, α∗ = C

Run through input from left to right.

Let c ′ = c if α = α∗ and −c otherwise.

Every occurence of α∗ increases c ′ by one.

Every occurence that is not α∗ eitherincreases or decreases c ′ by 1.

If α∗ is in the majority, more increasesthan decreases!

Page 51: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

Page 52: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 53: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 54: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 55: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 56: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 57: Topics in TCS

Misra-Gries - A generalisation of Majority

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Returns at most k − 1 pairs (v , fv )

• For every (v , fv ) ∈ A where thetrue frequency is fv ,

fv −m

k≤ fv ≤ fv

• Every 1/k-heavy hitter is found.

• Some non-heaver hitters might bereported.

• Second pass may be needed

fai is the estimate for the frequency of token ai

Page 58: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 59: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 60: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 61: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 62: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 63: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 64: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 65: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 66: Topics in TCS

Misra-Gries - Worked example

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

Stream: ACABACBBm = 8, k = 3.

A→ fA = 1

C → fA = 1, fC = 1

A→ fA = 2, fC = 1

B → fA = 1

A→ fA = 2

C → fA = 2, fC = 1

B → fA = 1

B → fA = 1, fB = 1

Page 67: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 68: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 69: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 70: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 71: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 72: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 73: Topics in TCS

Misra-Gries - Space/Time

Given k, which elements (if any) appear more than m/k times?

Misra-Gries(a1, a2, . . . , am)

set A = ∅For each iif ai ∈ A

fai = fai + 1else

if |A| < k − 1add (ai , 1) to A

else

for (ai , fai ) ∈ A

fai = fai − 1

if fai = 0

remove(ai , fai ) from A

• Space: k − 1 pairs stored in total

• O(k(logm + log n)) bits space.

• Running time depends on datastructure used.

• Balanced binary search treeO(log n) time per operation.

• We can only decrement (orremove) if we previouslyincremented. Therefore O(m)operations.

• O(m(log n)) time overall.

Page 74: Topics in TCS

Modified Misra-Gries - Analysis

Let’s look at a less efficient version for the analysis.

Modified-MG(a1, a2, . . . , am)

set A = empty multiset

For each iif ai ∈ Aadd a copy of ai to A

else

if |supp(A)| < k − 1add ai to A

else

add and then delete aidelete one copy of each

item in A

• |supp(A)| is the number ofdistinct tokens in A.

• Identical except for space usage.

• Items are deleted in groups of k.

• Each item can be deleted ≤ mk

times.

Page 75: Topics in TCS

Modified Misra-Gries - Analysis

Let’s look at a less efficient version for the analysis.

Modified-MG(a1, a2, . . . , am)

set A = empty multiset

For each iif ai ∈ Aadd a copy of ai to A

else

if |supp(A)| < k − 1add ai to A

else

add and then delete aidelete one copy of each

item in A

• |supp(A)| is the number ofdistinct tokens in A.

• Identical except for space usage.

• Items are deleted in groups of k.

• Each item can be deleted ≤ mk

times.

Page 76: Topics in TCS

Modified Misra-Gries - Analysis

Let’s look at a less efficient version for the analysis.

Modified-MG(a1, a2, . . . , am)

set A = empty multiset

For each iif ai ∈ Aadd a copy of ai to A

else

if |supp(A)| < k − 1add ai to A

else

add and then delete aidelete one copy of each

item in A

• |supp(A)| is the number ofdistinct tokens in A.

• Identical except for space usage.

• Items are deleted in groups of k.

• Each item can be deleted ≤ mk

times.

Page 77: Topics in TCS

Modified Misra-Gries - Analysis

Let’s look at a less efficient version for the analysis.

Modified-MG(a1, a2, . . . , am)

set A = empty multiset

For each iif ai ∈ Aadd a copy of ai to A

else

if |supp(A)| < k − 1add ai to A

else

add and then delete aidelete one copy of each

item in A

• |supp(A)| is the number ofdistinct tokens in A.

• Identical except for space usage.

• Items are deleted in groups of k.

• Each item can be deleted ≤ mk

times.

Page 78: Topics in TCS

Modified Misra-Gries - Analysis

Let’s look at a less efficient version for the analysis.

Modified-MG(a1, a2, . . . , am)

set A = empty multiset

For each iif ai ∈ Aadd a copy of ai to A

else

if |supp(A)| < k − 1add ai to A

else

add and then delete aidelete one copy of each

item in A

• |supp(A)| is the number ofdistinct tokens in A.

• Identical except for space usage.

• Items are deleted in groups of k.

• Each item can be deleted ≤ mk

times.

Page 79: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 80: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 81: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 82: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 83: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 84: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 85: Topics in TCS

Misra-Gries - Analysis

Statement for Misra-GriesFor every (v , fv ) ∈ A where the true frequency is fv ,

fv −m

k≤ fv ≤ fv

I Items are deleted in groups of k.

I Each item can therefore be deleted ≤ mk times.

I Let a(v) be the number of times v was seen without incrementingfv . Let b(v) be the number of times fv was decremented.

I We now have that fv = fv − a(v)− b(v) = fv − (a(v) + b(v)).

I The number of decrements equals a(v) + b(v), thereforea(v) + b(v) ≤ m

k .

I Hence fv − mk ≤ fv .

Page 86: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 87: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 88: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 89: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 90: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 91: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.

Page 92: Topics in TCS

Summary

Misra-Gries

The Misra-Gries algorithm uses O(1ε (logm + log n)) bits ofspace to find a set of size at most d1ε e that contains everyitem of frequency at least εm.

1. Majority uses O(logm + log n) space and runs in linear time.

2. If there is an item that occurs more than m/2 times theMajority algorithm will output it.

3. With a second pass we can check if there is a majority in thestream

4. Misra-Gries uses O(k(logm + log n)) space and runs inO(m logm) time.

5. It will output all tokens that occur more than m/k times but mayoutput others as well.

6. With a second pass we can remove all the undesired tokens.