Applications of Information Theory in Science and in ... · Applications of Information Theory in...

21
1 IST, Lisbon, 13-12-2016 Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo Some Uses of Shannon Entropy (Mostly) Outside of Engineering and Computer Science

Transcript of Applications of Information Theory in Science and in ... · Applications of Information Theory in...

Page 1: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

1 IST, Lisbon, 13-12-2016

Applications of Information Theory in Science and in Engineering

Mário A. T. Figueiredo

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAAA

Some Uses of Shannon Entropy (Mostly) Outside of Engineering and Computer Science

Page 2: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

2 IST, Lisbon, 13-12-2016

Communications: The Big Picture

Let’s focus here...

Page 3: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

3 IST, Lisbon, 13-12-2016

What is entropy?

Source (discrete and memoryless)

X 2 {1, ..., N}

pi = P(X = i)

(bits/symbol)

H(X) = H(p1, ..., pN )

Shannon’s axioms (1948):

Continuity with respect to p1, ..., pN

Uncertainty grows with , if pi = 1/NN

Grouping doesn’t change uncertainty:

This entropy is the unique function satisfying these axioms.

A measure of uncertainty, i.e., lack of information

=

NX

i=1

pi log21

pi

Page 4: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

4 IST, Lisbon, 13-12-2016

What did Shannon use entropy for?

Source Binary coder

…if the code is “20-questions” type (called instantaneous)

a 1/2 1

b 1/4 01

c 1/8 000

d 1/8 001

Optimal strategy/code: Huffman (1952)

a 1

0

?

bits/questions: first,

0

b 1

?

second,

c

d

1

0

?

third.

Optimal lengths:

' log

1

pi

Easy to extend to Markov sources

Page 5: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

5 IST, Lisbon, 13-12-2016

Information and entropy in physics and other sciences

Entropy provides a measure of uncertainty/information

...information is a (the?) core concept in science!

At the very extreme, Wheeler (1989) claims:

"It from bit symbolises the idea (…) that what we call reality arises in the last analysis from the posing of yes-no questions (…); in short, that all things physical are information-theoretic in origin and this is a participatory universe."

“The universe is made of information; matter and energy are only incidental.”

Page 6: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

6 IST, Lisbon, 13-12-2016

The impact of Shannon’s IT work

Page 7: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

7 IST, Lisbon, 13-12-2016

The impact of Shannon’s IT work

Soon after Shannon, biologists started using IT:

S. Dancoff and H. Quastler (1953). "The Information Content and Error Rate of Living Things". Essays on the Use of Information Theory in Biology.

“How many bits have to go in there? And what is the informational content of that which produces these bits?”

The impact of information theory in the biological sciences is enormous!

...impossible to even scratch the surface in 10 minutes!

Sequence logos (Schneider & Stephens, 1990)

Information in evolution (Schneider 2000)

Page 8: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

8 IST, Lisbon, 13-12-2016

(2007)

Page 9: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

9 IST, Lisbon, 13-12-2016

(2014)

“For cells, we now know that (…) information is stored in a cell’s inherited genetic material, and is precisely the kind that Shannon described in his 1948 articles.”

“Information is the currency of life. One definition of information is the ability to make predictions with a likelihood better than chance. That’s what any living organism needs to be able to do, because if you can do that, you’re surviving at a higher rate.”

Page 10: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

10 IST, Lisbon, 13-12-2016

“Shannon’s entropy-based diversity is the standard for ecological communities. The exponentials of Shannon’s and the related “mutual information” excel in their ability to express diversity intuitively, and provide a generalised method of considering microscopic behaviour to make macroscopic predictions, under given conditions.”

Page 11: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

11 IST, Lisbon, 13-12-2016

Page 12: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

12 IST, Lisbon, 13-12-2016

Other uses of Shannon’s entropy...

Page 13: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

13 IST, Lisbon, 13-12-2016

Page 14: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

14 IST, Lisbon, 13-12-2016

(2014)

“The application of entropy in finance can be regarded as the extension of the information entropy and the probability entropy. It can be an important tool in portfolio selection and asset pricing.”

Page 15: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

15 IST, Lisbon, 13-12-2016

Page 16: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

16 IST, Lisbon, 13-12-2016

“As an example of the utility of information theory to the analysis of animal communication systems, we applied a series of information theory statistics to a statistically categorized set of bottlenose dolphin whistle vocalizations.”

“Information theory measures (Shannon 1948; …) provide (…) tools for examining and comparing communication systems across species.”

Page 17: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

17 IST, Lisbon, 13-12-2016

Yet another application of the Shannon entropy...

“We introduce a new quantity to petrology, the Shannon entropy, as a tool for quantifying mixing as well as the rate of production of hybrid compositions in the mixing system.”

Page 18: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

18 IST, Lisbon, 13-12-2016

Page 19: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

19 IST, Lisbon, 13-12-2016

What if we can’t estimate probabilities?

Markov source Lempel-Ziv coder

Asymptotically optimal, but adaptive/universal (Lempel & Ziv, 1977, 1978)

LZ coding exploits the redundancy of language (self-parsing):

LZ coding thus yields an entropy estimate, without estimating probabilities

LZ coding also closely related to Kolmogorov complexity

Page 20: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

20 IST, Lisbon, 13-12-2016

One step further: comparing two sources/texts

Question: was this sequence produced by that source? e.g. was this text, produced by that author? (authorship attribution)

Compress a sequence, with respect to another (cross-parsing) (Ziv & Merhav, 1993)

ZM method (ZMM) coding thus yields a relative entropy estimate (dissimilarity)

Authorship attribution (Pereira-Coutinho & F, 2013)

Page 21: Applications of Information Theory in Science and in ... · Applications of Information Theory in Science and in Engineering Mário A. T. Figueiredo TexPoint fonts used in EMF. Read

21 IST, Lisbon, 13-12-2016

Summary:

•  The Shannon entropy, as a measure of information, has a huge impact, both inside and outside of its original field of application (communications)

•  However, beware the bandwagon...

(from L. Thims, 2012)