Introduction to information complexity

Post on 12-Feb-2016

40 views 0 download

Tags:

description

Introduction to information complexity. June 30, 2013. Mark Braverman Princeton University. Part I: Information theory. Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. . communication channel. Bob. Alice. - PowerPoint PPT Presentation

Transcript of Introduction to information complexity

1

Introduction to information complexity

June 30, 2013

Mark BravermanPrinceton University

Part I: Information theory

• Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels.

2

communication channel

Alice Bob

Quantifying “information”

• Information is measured in bits.• The basic notion is Shannon’s entropy. • The entropy of a random variable is the

(typical) number of bits needed to remove the uncertainty of the variable.

• For a discrete variable:

3

Shannon’s entropy• Important examples and properties:

– If is a constant, then – If is uniform on a finite set of possible values,

then .– If is supported on at most values, then .– If is a random variable determined by , then .

4

Conditional entropy• For two (potentially correlated) variables ,

the conditional entropy of given is the amount of uncertainty left in given :

.• One can show .• This important fact is known as the chain

rule. • If , then

5

Example

• Where .• Then

– ; ; ;

6

Mutual information

7

𝐻 (𝑋 ) 𝐻 (𝑌 )𝐻 (𝑌∨𝑋 )𝐻 (𝑋∨𝑌 )

𝐵1𝐵1⊕𝐵2

𝐵2⊕𝐵3

𝐵4

𝐵5

𝐼 (𝑋 ;𝑌 )

Mutual information• The mutual information is defined as

• “By how much does knowing reduce the entropy of ?”

• Always non-negative .• Conditional mutual information:

• Chain rule for mutual information:

• Simple intuitive interpretation. 8

Information Theory• The reason Information Theory is so

important for communication is because information-theoretic quantities readily operationalize.

• Can attach operational meaning to Shannon’s entropy: “the cost of transmitting ”.

• Let be the (expected) cost of transmitting a sample of .

9

?

• Not quite. • Let trit • .

• It is always the case that .

10

1 02 10

3 11

But and are close

• Huffman’s coding: • This is a compression result: “an

uninformative message turned into a short one”.

• Therefore: .

11

Shannon’s noiseless coding• The cost of communicating many copies of

scales as . • Shannon’s source coding theorem:

– Let be the cost of transmitting independent copies of . Then the amortized transmission cost

.• This equation gives operational

meaning. 12

communication channel

𝐻 ( 𝑋 )𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑

13

𝑋 1 ,… ,𝑋𝑛 ,… per copy to transmit ’s

is nicer than

• is additive for independent variables. • Let be independent trits. • .

• Works well with concepts such as channel capacity.

14

Operationalizing other quantities

• Conditional entropy • (cf. Slepian-Wolf Theorem).

communication channel

𝑋 1 ,… ,𝑋𝑛 ,… per copy to transmit ’s

𝑌 1 ,… ,𝑌 𝑛 ,…

communication channel

Operationalizing other quantities

• Mutual information :

𝑋 1 ,… ,𝑋𝑛 ,… per copy to sample ’s

𝑌 1 ,… ,𝑌 𝑛 ,…

Information theory and entropy

• Allows us to formalize intuitive notions. • Operationalized in the context of one-way

transmission and related problems. • Has nice properties (additivity, chain rule…)• Next, we discuss extensions to more

interesting communication scenarios.

17

Communication complexity• Focus on the two party randomized setting.

18

A B

X YA & B implement a functionality .

F(X,Y)

e.g.

Shared randomness R

Communication complexity

A B

X Y

Goal: implement a functionality .A protocol computing :

F(X,Y)

m1(X,R)m2(Y,m1,R)

m3(X,m1,m2,R)

Communication cost = #of bits exchanged.

Shared randomness R

Communication complexity• Numerous applications/potential

applications.• Considerably more difficult to obtain lower

bounds than transmission (still much easier than other models of computation!).

20

Communication complexity

• (Distributional) communication complexity with input distribution and error : Error w.r.t. .

• (Randomized/worst-case) communication complexity: . Error on all inputs.

• Yao’s minimax:.

21

Examples

• Equality .

• .

22

Equality• is .• is a distribution where w.p. and w.p. are

random.

A B

X Y

• Shows that

MD5(X) [128 bits]X=Y? [1 bit]

Error?

Examples

• I. • .In fact, using information complexity:• .

24

Information complexity

• Information complexity :: communication complexity

as• Shannon’s entropy ::

transmission cost

25

Information complexity

• The smallest amount of information Alice and Bob need to exchange to solve .

• How is information measured?• Communication cost of a protocol?

– Number of bits exchanged. • Information cost of a protocol?

– Amount of information revealed.

26

Basic definition 1: The information cost of a protocol

• Prior distribution: .

A B

X Y

Protocol πProtocol transcript

𝐼𝐶(𝜋 ,𝜇)= 𝐼 (Π ;𝑌∨𝑋 )+𝐼 (Π ; 𝑋∨𝑌 )what Alice learns about Y + what Bob learns about X

Example• is .• is a distribution where w.p. and w.p. are

random.

A B

X Y

1 + 65 = 66 bits

what Alice learns about Y + what Bob learns about X

MD5(X) [128 bits]X=Y? [1 bit]

Prior matters a lot for information cost!

• If a singleton,

29

Example• is .• is a distribution where are just uniformly

random.

A B

X Y

0 + 128 = 128 bits

what Alice learns about Y + what Bob learns about X

MD5(X) [128 bits]X=Y? [1 bit]

Basic definition 2: Information complexity

• Communication complexity:.

• Analogously:.

31

Needed!

Prior-free information complexity

• Using minimax can get rid of the prior. • For communication, we had:

.• For information

.

32

Operationalizing IC: Information equals amortized communication

• Recall [Shannon]: .• Turns out [B.-Rao’11]: , for . [Error allowed on each copy]• For : .•[ an interesting open problem.]

33

Entropy vs. Information Complexity

Entropy IC

Additive? Yes Yes

Operationalized

Compression? Huffman: ???!

Can interactive communication be compressed?

• Is it true that ?• Less ambitiously:

• (Almost) equivalently: Given a protocol with , can Alice and Bob simulate using communication?

• Not known in general…

35

Applications

• Information = amortized communication means that to understand the amortized cost of a problem enough to understand its information complexity.

36

Example: the disjointness function• , are subsets of • Alice gets , Bob gets .• Need to determine whether .• In binary notation need to compute

• An operator on copies of the 2-bit AND function.

37

Set intersection

• , are subsets of • Alice gets , Bob gets .• Want to compute .• This is just copies of the 2-bit AND. • Understanding the information complexity

of AND gives tight bounds on both problems!

38

Exact communication bounds[B.-Garg-Pankratov-Weinstein’13]

• (trivial). • [Kalyanasundaram-Schnitger’87,

Razborov’92]New:• .

39

Small set disjointness• , are subsets of , • Alice gets , Bob gets .• Need to determine whether .• Trivial: .• [Hastad-Wigderson’07]:• [BGPW’13]: .

40

Open problem: Computability of IC

• Given the truth table of , and , compute • Via can compute a sequence of upper

bounds.• But the rate of convergence as a function of

is unknown.

41

Open problem: Computability of IC

• Can compute the -round information complexity of .

• But the rate of convergence as a function of is unknown.

• Conjecture:

• This is the relationship for the two-bit AND.

42

43

Thank You!