Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael...

Post on 21-Dec-2015

216 views 0 download

Transcript of Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael...

Molecular Information Theory

Niru Chennagiri

Probability and Statistics

Fall 2004

Dr. Michael Partensky

Overview

Why do we study Molecular Info. Theory?What are molecular machines?Power of LogarithmComponents of a Communication SystemDiscrete Noiseless SystemChannel CapacityMolecular Machine Capacity

Motivation

Needle in a haystack situation.How will you go about looking for the

needle?How much energy you need to spend?How fast can you find the needle?Haystack = DNA, Needle = Binding site,

You = Ribosome

What is a Molecular Machine?

One or more molecules or a molecular complex: not a macroscopic reaction.

Performs a specific function.Energized before the reaction.Dissipates energy during reaction.Gains information.An isothermal engine.

Where is the candy?

Is it in the left four boxes? Is it in the bottom four boxes? Is it in the front four boxes?

You need answer to three questions to find the candy

Box labels: 000, 001, 010, 011, 100, 101, 110, 111

Need log8 = 3 bits of information

More candies…

Box labels: 00, 01, 10, 11, 00, 01, 10, 11 Candy in both boxes labeled 01.Need only log8 - log2 = 2 bits of

information.

In general,

m boxes with n candies need

log m - log n bits of information

Ribosomes

2600 binding sites from

4.7 million base pairs

Need

log(4.7 million) - log(2600)

= 10.8 bits of information.

Communication System

Information Source

Represented by a stochastic processMathematically a Markov chainWe are interested in ergodic sources: Every

sequence is statistically same as every other sequence.

How much information is produced?

Measure of uncertainty H should be:Continuous in the probability.Monotonic increasing function of the

number of events.When a choice is broken down into two

successive choices, Total H = weighted sum of individual H

Enter Entropy

H=- Kâi=1

n

pi logHpiL

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

Properties of EntropyH is zero iff all but one p are zero.H is never negative.H is maximum when all the events are

equally probableIf x and y are two events

H(x,y) H(x) + H(y)Conditional entropy:

H y p i j p jx ii j

( ) ( , ) lo g ( ),

Hx(y) H(y)

Why is entropy important?Entropy is a measure of uncertainty. Entropy relation from thermodynamics

Also from thermodynamics

For every bit of information gained, the machine dissipates kBTln2 joules.

H H HA fter B efore

S k H k RB B ( ln ) ( ln )2 2

Sq

T

Ribosome binding sites

Information in sequence

Position p H Before H After Change inH

1 A-1/2G-1/2

2 1 1

2 U-1 2 0 2

3 G-1 2 0 2

Information curve

H l f b l f b lb A C G T

( ) ( , ) lo g ( , ){ , , , }

R l H lSequence ( ) ( ) 2Information gain for site l is

Plot of this across the sites gives Information curve.For E.Coli, Total information is about 11 bits.… same as what the ribosome needs.

Sequence Logo

Channel capacity

Source transmitting 0 and 1 at 1000 symbols/sec.1 in 100 symbols have an error.What is the rate of transmission?Need to apply a correction correction = uncertainty in x for a given value of y Same as conditional entropy

H xy ( ) ( . lo g . . lo g . ) 0 9 9 0 9 9 0 0 1 0 0 1

= 81 bits/sec

Channel capacity contd.

C M ax H x H xy { ( ) ( )}

Shannon’s theorem:As long as the rate of transmission is below C, the number of errors can me made as small as needed.

For a continuous source with white noise,

C WP

N

lo g 1Signal to noise ratio

Bandwidth

Molecular Machine Capacity

Lock and key mechanism.Each pin on the ribosome is a simple

harmonic oscillator in thermal bath.Velocity of the pins represented by points in

2-d velocity spaceMore pins -> more dimensions.Distribution of points is spherical.

Machine capacity

For larger dimensions:All points are in a thin spherical shellRadius of the shell is the velocity and hencesquare root of the energyBefore binding:

r P Nbefore y y

After Binding:

r Na ftere y

Number of choices = Number of ‘after’ spheres that can sit in the ‘before’ sphere=Vol. of Before sphere/Vol. Of after sphereMachine capacity = logarithm of number of choices

C dP

N

lo g 1

References

Claude Shannon, Mathematical Theory of communication, Reprinted with

corrections from The Bell System Technical Journal,Vol. 27, pp. 379–423, 623–656,

July, October, 1948.

Mathematical Theory of Communication by Claude E. Shannon, Warren Weaver

T. D. Schneider, Sequence Logos, Machine/Channel Capacity, Maxwell's Demon,

and Molecular Computers: a Review of the Theory of Molecular Machines,

Nanotechnology, 5: 1-18, 1994

T. D. Schneider, Theory of Molecular Machines. I. Channel Capacity of Molecular

Machines J. Theor. Biol., 148:, 83-123, 1991

How (and why) to find a needle in a haystack Article in The Economist (April 5th-

11th 1997, British version: p. 105-107, American version: p. 73-75, Asian version: p.

79-81).

http://www.math.tamu.edu/~rahe/Math664/gene1.html

http://www.lecb.ncifcrf.gov/~toms/