Beyond Set Disjointness : The Communication Complexity of Finding the Intersection

25
Beyond Set Disjointness: The Communication Complexity of Finding the Intersection Grigory Yaroslavtsev http://grigory.us Joint with Brody, Chakrabarti, Kondapally and Woodruff

description

Beyond Set Disjointness : The Communication Complexity of Finding the Intersection. Grigory Yaroslavtsev http://grigory.us. Joint with Brody, Chakrabarti , Kondapally and Woodruff. Communication Complexity [Yao’79]. Shared randomness. Bob: . Alice: . …. - PowerPoint PPT Presentation

Transcript of Beyond Set Disjointness : The Communication Complexity of Finding the Intersection

Page 1: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Beyond Set Disjointness: The Communication Complexity of Finding the

Intersection

Grigory Yaroslavtsevhttp://grigory.us

Joint with Brody, Chakrabarti, Kondapally and Woodruff

Page 2: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Communication Complexity [Yao’79]

Alice: Bob:

𝒇 (𝒙 ,𝒚 )=?

Shared randomness

…𝒇 (𝒙 ,𝒚 )

• = min. communication (error ) • min. -round communication (error )

Page 3: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Set Intersection

𝒙=𝑆 , 𝒚=𝑇 , 𝒇 (𝒙 , 𝒚 )=𝑆∩𝑇𝑺⊆ [𝑛 ] ,|𝑆|≤𝒌 𝑻 ⊆ [𝑛 ] ,|𝑇|≤𝒌 = ?

(-Intersection) = ?

Page 4: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

This talk

Let

• (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODC’14]• (-Intersection) = [Saglam-Tardos FOCS’13; Brody, Chakrabarti, Kondapally, Woodruff, Y.’13]

{

times

(-Intersection) = for

Page 5: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

-Disjointness• , iff • [Razborov’92; Hastad-Wigderson’96] • [Folklore + Dasgupta, Kumar, Sivakumar; Buhrman’12, Garcia-Soriano, Matsliah, De Wolf’12]

• [Saglam, Tardos’13]• [Braverman, Garg, Pankratov, Weinstein’13]

Page 6: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Applications

• : exact Jaccard index ( for -approximate use MinHash [Broder’98; Li-Konig’11; Path-Strokel-Woodruff’14])• Rarity, distinct elements, joins,…• Multi-party set intersection (later)• Contrast:

Page 7: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

1-round -protocol

𝒉 : [𝒏 ]→[𝒌3]

𝑺 𝑻

𝒉(𝑺) 𝒉(𝑻 )

[𝒏 ] [𝒏 ]

[𝒌3] [𝒌3]

Page 8: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Hashing

log 𝒌

=# of buckets

𝒉 : [𝒏 ]→[𝒌 / log𝒌]

Expected # of elements

Page 9: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Secondary Hashing

= # of hash functions

log 3𝒌 where

Page 10: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

2-Round -protocol

log 3𝒌

log 3𝒌

|h𝑖 (𝑺 )|=|h𝑖 (𝑻 )|=𝑂 ( log𝒌 log log𝒌 )

Total communication = = O()

Page 11: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

𝒌log𝒌

log 3𝒌Pr [𝑐𝑜𝑙𝑙𝑖𝑠𝑖𝑜𝑛 ]=𝑂(1log𝒌 )

Page 12: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

log 3𝒌

log 3𝒌

Page 13: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

• Second round: – For each bucket send -bit equality check (total -

communication)– Correct intersection computed in buckets where

– Expected # of items in incorrect buckets – Use 1-round protocol for incorrect buckets– Total communication

Page 14: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Main protocol

𝑂 (1)

=# of buckets

𝒉 : [𝒏 ]→[𝒌]

Expected # of elements

Page 15: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification tree -degree

…log 𝑟− 1𝒌

buckets = leaves of the verification tree

Page 16: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

𝑺𝟏❑ ,𝐓𝟏

❑ 𝑺𝟐❑ ,𝐓𝟐

𝑺𝟏❑∪𝑺𝟐 ,𝐓𝟏

❑∪𝑻 𝟐

𝑺𝟏❑∩𝐓𝟏

❑𝑺𝟐❑∩𝐓𝟐

(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿

Page 17: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

EQ()

Verification bottom-up

𝑺𝟏❑∩𝐓𝟏

❑𝑺𝟐❑∩𝐓𝟐

(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿

Correct Incorrect

Incorrect

𝑺𝟏❑∩𝐓𝟏

❑𝑺𝟐❑∩𝐓𝟐

(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿

Correct Incorrect

Page 18: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Correct

EQ()

Verification bottom-up

𝑺𝟏❑∩𝐓𝟏

❑𝑺𝟐❑∩𝐓𝟐

(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿

Correct Incorrect

Incorrect

𝑺𝟏❑∩𝐓𝟏

❑𝑺𝟐❑∩𝐓𝟐

(𝑺𝟏❑∪𝑺𝟐 )∩(𝐓 ¿¿𝟏❑∪𝑻 𝟐)¿

Correct Incorrect

Correct

Page 19: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

𝒑𝒓 −𝟐

…𝒑𝟏

𝑺𝟏𝟏 ,𝐓𝟏

𝟏 … 𝑺𝒊𝟏 ,𝐓 𝐢

𝟏𝑺𝟐𝟏 ,𝐓𝟐

𝟏 𝑺𝒌𝟏 ,𝐓𝒌

𝟏…

𝒑𝒓 −𝟏

Page 20: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Analysis of Stage

• = [node at stage computed correctly]• Set = – Run equality checks and basic intersection

protocols with success probability – Key lemma: [# of restarts per leaf – Cost of Equality = – Cost of Intersection in leafs =

• [protocol succeeds] =

Page 21: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Lower Bound

• (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.’13]• iff , where • = solving independent instances of • reduces to -Intersection:– Given and – Construct sets with elements and

Page 22: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Communication Direct Sums

“Solving m copies of a communication problem requires m times more communication”:• For arbitrary [… Braverman, Rao 10; Barak

Braverman, Chen, Rao 11, ….]• In general, can’t go beyond

Page 23: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Information cost Communication complexity• [Bar Yossef, Jayram, Kumar,Sivakumar’01]

Disjointness

• Stronger direct sum for bounded-round complexity of Equality-type problems (a.k.a. “union bound is optimal”) [Molinaro, Woodruff, Y.’13]

Specialized Communication Direct Sums

Page 24: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Extensions

• Multi-party: players, , where

– Boost error probability to – Average per player (using coordinator):

in rounds–Worst-case per player (using a tournament) in rounds

Page 25: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Open Problems

• (-Intersection) = • Better protocols for the multi-party setting