Information Theory The Work of Claude Shannon (1916-2001) and others.
-
Upload
roger-burns -
Category
Documents
-
view
213 -
download
1
Transcript of Information Theory The Work of Claude Shannon (1916-2001) and others.
Information Theory
The Work of Claude Shannon (1916-2001)
and others
Introduction to Information Theory
Lectures taken from:
John R. Pierce: An Introduction to Information Theory: Symbols, Signals and Noise, N.Y. Dover Publishing, 1980 [second edition], 2 copies on reserve in ISAT Library
Read Chapters 2 - 5
Information Theory
Theories physical or mathematical are generalizations or laws that explain a broad range of phenomena
Mathematical theories do not depend on the nature of objects. [e.g. Arithmetic: it applies to any objects.]
Mathematical theorists make assumptions and definitions; then draw out their implications in proofs and theorems which then may call into question the assumptions and definitions
Information Theory
Communication Theory [the term Shannon used] tells how many bits of information can be sent per second over perfect and imperfect communications channels, in terms of abstract descriptions of the properties of these channels
Information Theory
Communication theory tells how to measure the rate at which a message source generates information
Communication theory tells how to encode messages efficiently for transmission over particular channels and tells us when we can avoid errors
Information Theory
The origins of information theory was telegraphy and electrical communications: Thus it uses “discrete” mathematical theory [statistics] as well as “continuous” mathematical theory [wave equations and Fourier Analysis]
The term “Entropy” in information theory was an analogy from the term used in statistical mechanics and physics.
Entropy in Information Theory
In physics, if a process is reversible, the entropy is constant. Energy can be converted from thermal to mechanical and back.
Irreversible processes resulted in an increase in entropy
Thus, entropy was also a measure of “order”: increase in entropy = decrease of order
Entropy
By analogy, if information is “disorderly”, there is less knowledge, or disorder is equivalent to unpredictability [in physics, a lack of knowledge about the positions and velocities of particles]
Entropy
In which case does a message of a given length convey the most information?
A. I can only send one of 10 messages. B. I can only send one of 1,000,000
messages
In which state is there “more entropy”:
Entropy
EntropyNumber of possible
messages
Amount of Uncertainty
Amount of Information in message
Low EntropyOne
message out of 10
Small uncertainty
Small amount of information
High Entropy
One message
out of 1,000,000
Large uncertainty
Large amount of information
Entropy
Entropy = amount of information conveyed by a message from a source
Information in popular use means the amount of knowledge it conveys; its “meaning”
“Information” in communication theory refers to the amount of uncertainty in a system that a message will get rid of.
Symbols and Signals
It makes a difference HOW you translate a message into electrical signals.
Morse/Vail instinctively knew that shorter codes for frequently used letters would speed up the transmission of messages
“Morse Code” could have been 15% faster by better research on letter frequencies.
Symbols and Signals
[Telegraphy] Discrete Mathematics [statistics] used where current shifts represent on/off choices or combinations of on/off choices.
[Telephony]. Continuous Mathematics [sine functions and Fourier Analysis] is used where complex wave forms encode information in terms of changing frequencies and amplitudes.
Speed of Transmission: Line Speed
A given circuit has a limit to the speed of successive current values that can be sent, before individual symbols [current changes] interfere with one another and cannot be distinguished at the receiving end.[ “inter-symbol interference] This is the “Line Speed”
Different materials [coaxial cable, wire, optical fiber] would have a different line speeds, represented by K in the equations.
Transmission Speed
If more “symbols” can be used [different amplitudes or different frequencies], more than one message can be sent simultaneously, and thus transmission speed can be increased above line speed by effective coding, using more symbols. W = K(Log2 m)
If messages are composed of 2 “letters” and we send M messages simultaneously, then we need 2M different current values, to represent the combinations of M messages using two letters. W = K Log2 (2M) = KM
Nyquist
Thus the Speed of Transmission, W, is proportional to the line speed [which is related to the number of successive current values per second you can send on the channel] AND the number of different messages you can send simultaneously. [which depends on how & what you code]
Symbols and Signals
Message # 1
Message # 2
Current values
ON ON +3
OFF ON +1
OFF OFF -1
ON OFF -3
Transmission Speed
1 on/off
Msge.
W2 =
(on/off)
KLog2(2) = K
W3 =
(+1, 0, -1)
KLog2(3) = 1.6 K
2 on/off msgs..
W4 =
3,-3,1,-1)
KLog2(4) = 2K
3 on/off
Msges.
W8 = KLog2(8) = 3K
Attenuation and noise interference may make certain values unusable for coding.
Telegraphy/Telephony/Digital
In Telephony, messages are composed of a continuously varying wave form, which is a direct translation of pressure wave into electromagnetic wave.
Telegraphy codes could be sent simultaneously with voice, if we used frequencies [not amplitudes] and selected ones that were not confused with voice frequencies.
Fourier Analysis enables us to “separate out” the frequencies at the other end.
Fourier Analysis If transmission characteristics of a channel do not change with
time, it is a linear circuit. Linear Circuits may have attenuation [amplitude changes] or
delay [phase shifts], but they do not have period/frequency changes.
Fourier showed that any complex wave form [quantity varying with time] could be expressed as a sum of sine waves of different frequencies.
Thus, a signal containing a combination of frequencies [some representing codes of dots and dashes, and some representing the frequencies of voice] can be de-composed at the receiver and decoded. [draw picture]
Fourier Analysis
In digital communications, we “sample” the continuously varying wave, and code it into binary digits representing the value of the wave at time t and then send different frequencies to represent simultaneous messages of samples.
Digital Communications
001100101011100001101010100…. This stream represents the values of a sound
wave at intervals of 1/x seconds
01011110000011110101011111010… This stream represents numerical data in a data
base 00110010100010000110101010100…
This stream represents coded letters
Digital Communications
I represent the three messages simultaneously by a range of frequencies:
0 0 1… “000” = f 1
0 1 0… “010” = f 2
0 0 1… “101” = f 3
How many frequencies do I need? 2M
Digital Communications
The resulting signal containing three simultaneous messages is a wave form changing continuously across these 8 frequencies f 1, f 2, f 3,…..
And with Fourier analysis I can tell at any time what the three different streams are doing.
And we know that the speed of transmission will vary with this “bandwidth”.
Digital Communications
Input: messages coded by several frequencies Channel Distortion: Signals of different frequencies
undergo different amplitude and phase shifts during transmission.
Output: same frequencies, but with different phases and amplitudes, thus wave has different shape. Fourier analysis can tell you what frequencies were sent, and thus what the three messages were.
In “distortionless circuits” shape of input is the same as the shape of the output.
Hartley
Given a “random selection of symbols” from a given set of symbols [a message source]
The “Information” in a message, H, is proportional to the bandwidth [allowable values] x “time of transmission”, H = n log s
n = # of symbols selected, s = # of different symbols possible (2M in the previous example) , log s [# independent choice sent simultaneously [i.e. proportional to the speed of transmission]
And now, time for something completely different!
Claude Shannon: encoding simultaneous messages from a known “ensemble” [i.e. bandwidth], so to transmit them accurately and swiftly in the presence of noise.
Norbet Weiner: research on extracting signals of an “ensemble” from noise of a known type, to predict future values of the “ensemble” [tracking enemy planes].
Other names
Dennis Gabor’s theory of communication did not include noise
W.G. Tuller explored the theoretical limits on the rate of transmission of information
Mathematics of Information
Deterministic v. stochastic models.
How do we take advantage of the language [the probabilities that a message will contain certain things] to further compress and encode messages.
0-order approximation of language: all 26 letters have equal probabilities
1st-order approximation of English: we assign appropriate probabilities to letters