©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.

©2003/04 Alessandro Bogliolo

sCIENZEtECNOLOGIEDELL’iNFORMAZIONE

ISTITUTO DIE


ISTITUTO DIE

Background

Information theory Probability theory

Algorithms


ISTITUTO DIE


ISTITUTO DIE


Outline1. Information theory

1. Definitions2. Encodings3. Digitalization

2. Probability theory1. Random process2. Probability (marginal, join, conditioned), Bayes Theorem3. Probability distribution4. Sampling theory and estimation

3. Algorithms1. Definition2. Equivalence3. Complexity



ISTITUTO DIE


ISTITUTO DIE

Information

• Information: reduction of uncertainty• The minimum uncertainty is given by two

alternatives• The elementary choice between 2 alternatives

contains the minimum amount of information• Bit: binary digit encoding the elementary choice

between 2 alternatives (information unit)



ISTITUTO DIE


ISTITUTO DIE

Strings

• String: sequence of N characters (or digits) taken from a given finite alphabete of S symbols

• There are SN different strings of N characters taken from the same alphabete of S symbolsThere are SN different configurations of N characters taken from an alphabete of S symbols

• A binary string is composed of bits, defined over the binary alphabete

• Byte: binary string of 8 bits 10,B



ISTITUTO DIE


ISTITUTO DIE

Encodings• Encoding: assignment of strings with the elements

of a set according to a given ruleProperties:– Irredundant: each element is

assigned with a unique string

– Constant length: all code words are of the same length

– Exact: all elements are encoded and there are no elements associated with the same string

0101101110

00011011

000001010011100101110



ISTITUTO DIE


ISTITUTO DIE

Encoding finite sets• The minimum number of digits of a constant-length exact encoding of a set of M elements

is

so that• Properties: MlogN S

MS N

NS=2 S=4

1 2 42 4 163 8 644 16 2565 32 10246 64 40967 128 163848 256 655369 512 26214410 1024 1048576

M

0

2

4

6

8

10

12

0 200 400 600 800 1000 1200

M

N

S=4

S=2

)MM(logNNMlogMlog SSS 212121

kSS MlogNkMlogk

212121

NNNN SSSMM

NkkNk S)S(M

12

2

1 Slog

MlogMlog

S

SS



ISTITUTO DIE


ISTITUTO DIE

Encoding unlimited sets (limitation)

• An unlimited set contains infinite elementsExample: integer numbers

• Infinite sets cannot be exactly encoded

• In order to be digitally encoded the set must be restricted to a limited, finite subset

• In most cases this is done by encoding only the elements within given lower and upper boundsExample: integer numbers within 0 and 999

• The limited subset may be exactly encoded



ISTITUTO DIE


ISTITUTO DIE

Example: positional notation

00

11

22

11 ... bcbcbcbc n

nn

n

0121 ... cccc nn

• The base-b positional representation of an integer number of n digits has the form

• The value of the number is

• n digits encode all integer numbers from 0 to bn-1

Example:b=2 , n=5 10011=1*16+1*2+1*1=19

11111=31=25-1



ISTITUTO DIE


ISTITUTO DIE

Encoding continuous sets (discretization)

• A continuous set contains infinite elementsExample: real numbers in a given interval, points on a plane

• In order to be digitally encoded, the set needs to be discretized: partitioned into a discrete number of subsets

• Codewords will be associated with subsets• The resulting encoding is approximated



ISTITUTO DIE


ISTITUTO DIE

Example: gray levels

1111111011011100101110101001100001110110010101000011001000010000

The encoding associates a unique code with an interval of gray levels

All gray levels within the interval are associated with the same code, thus loosing information

The original gray level cannot be exactly reconstructed from the code

Encoding associates each code with a unique gray level (representative of a class)



ISTITUTO DIE


ISTITUTO DIE

Example: 2D images

Gray level

x

y

nlev

nx

ny

pixel

levyx nnnsize 2log



ISTITUTO DIE


ISTITUTO DIE

Analog and digital signals• Signal: time-varying physical quantity

– Analog: continuous-time, continuous-value

– Digital: discrete-time, discrete-value

• The digital encoding of a continuous signal entails:– Sampling (i.e., time discretization)

– Quantization (i.e., value discretization)

sizerate sTssize

Sampling rate

Duration

Sample size



ISTITUTO DIE


ISTITUTO DIE

Example: time series

time

value

levratesizerate nTssTssize 2log



ISTITUTO DIE


ISTITUTO DIE

Example: videoyxcolratesizerate nnnlogTssTssize 2

srate = frame rate

ncol = number of colors

nxny = frame size

time

ny

nx

color



ISTITUTO DIE


ISTITUTO DIE

Redundancy• Redundant encoding: encoding that makes use of more than

the minimum number of digits required by an exact encoding

MN Slog

• Motivations for redundancy:– Providing more expressive/natural encoding/decoding rules– Reliability (error detection)

Ex: parity encoding

– Noise immunity / fault tolerance (error correction)

Ex: triplication



ISTITUTO DIE


ISTITUTO DIE

01101

• Parity encoding:– A parity bit is used to guarantee that all codewords have an even

number of 1’s

– Single errors are detected by means of a parity check

Redundancy: examples

0010 00101

000000111000

parity check0

1

error

Irredundant codeword

• Triple redundancy:– Each character is repeats 3 times

– Single errors are corrected by means of a majority voting

000000111010error

0 0 1 0 voting result



ISTITUTO DIE


ISTITUTO DIE

Compression• Lossy compression

– Compression achieved at the cost of reducing the accuracy of the representation

– The original representation cannot be restored

• Lossless compression– Compression achieved by either removing redundancy or

leveraging content-specific opportunities

– The original representation can be restored



ISTITUTO DIE


ISTITUTO DIE







ISTITUTO DIE


ISTITUTO DIE

Random process

• A random process:– Can be repeated infinite times

– May provide mutually-exclusive results

– Provide an unpredictable result at each trial

• Elementary event (e): each possible result of a single trial

• Event space (X): the set of all elementary events• Event (E): any set of elementary events (any subset of

the event space)

E

X

e



ISTITUTO DIE


ISTITUTO DIE

Probability

• Relative frequency of E over N trials: ratio of the number of occurrence of E (nE) over the N trials:

fN(E) = nE/N• Probability of E (empirical definition):

• Probability of E (assiomatic definition):p(E)=0 if E is emptyp(E)=1 if E=X

if

)(lim)( EfEp NN

)()()( 2121 EpEpEEp 21 EE



ISTITUTO DIE


ISTITUTO DIE

Probability (properties)

1. If all elementary events have the same probability, the probability of an event is given by its “relative size”:

p(E) = card(E)/card(X)

2.

3. If E1 is a subset of E2

4.

5.

)()( 21 EpEp

E1 E2

X

)()()( 21121 EEpEpEEp

)(1)( EpEp

)()()()( 212121 EEpEpEpEEp



ISTITUTO DIE


ISTITUTO DIE

Conditional probability• Joint probability of two events, outcomes of two random experiments

N

nlim)EE(p EE

N

2121

• Marginal probability

)EE(p)EE(p)E(p 21211

• Conditional probability

)E(p

)EE(p)E|E(p

2

2121

• Decomposition

)E|E(p)E(p)E|E(p)E(p)E(p 2122121



ISTITUTO DIE


ISTITUTO DIE

Independent events

• The joint probability of two independent events is equal to the product of their marginal probabilities

)E(p)E(p)EE(p 2121

• E1 and E2 are independent events if and only if)E(p)E|E(p 121



ISTITUTO DIE


ISTITUTO DIE

Bayes Theorem

• Bayes theorem: given two events E1 and E2, the conditional probability of E1 given E2 can be expressed as:

)E(p

)E(p)E|E(p)E|E(p

2

11221

• The theorem provides the statistical support for statistical diagnosis based on the evaluation of the probability of a possible cause (E1) of an observed effect (E2)



ISTITUTO DIE


ISTITUTO DIE

Random variable

• Random variable x: variable representing the outcome of a random experiment

• Probability distribution function of x: )x(p)(Fx xx

• Probability density function of x:

x

x

d

)(dF)a(f x

x



ISTITUTO DIE


ISTITUTO DIE

Sampling and estimation• Parent population of a random variable x: ideal set of the

outcomes of all possible trials of a random process

• Sample of x: set of the outcomes of N trials of the random process

• Sample parameters can be used as estimators of parameters of the parent population

• Example:

Xe

ii

i

)e(pexE

N

jjx

Nx

1

1

Expected value of x Sample average



ISTITUTO DIE


ISTITUTO DIE

Confidence of an estimator• The quality of the estimator P’ of a given parameter P can be expressed

in terms of:– Confidence interval: limiting distance d between the estimator and the

actual parameter

– Confidence level: probability c of finding the actual parameter within the confidence interval

• The smaller the confidence interval d and the higher the confidence level c, the better

• The quality of an estimator grows with the number of samples

• For a fixed confidence level c, the size of confidence interval d decreases with the inverse of the square root of N

d|'PP|obPrc



ISTITUTO DIE


ISTITUTO DIE

Variance, covariance, correlation

• Variance:

• Standard deviation:

• Covariance:

• Correlation:

• The confidence interval of an estimator is proportional to

N

jj )xx(

N 1

22 1

2

N

jjj )yy)(xx(

N)y,x(Cov

1

1

yx

)y,x(Cov)y,x(Corr

N



ISTITUTO DIE


ISTITUTO DIE







ISTITUTO DIE


ISTITUTO DIE

Algorithm• Definition: Finite description of a finite sequence of non

ambiguous instructions that can be executed in finite time to solve a problem or provide a result

• Key properties:– Finite description

– Non ambiguity

– Finite execution

• Algorithms take input data and provide output data

• Domain: set of all allowed configurations of the input data



ISTITUTO DIE


ISTITUTO DIE

Complexity• Complexity: measure of the number of elementary steps

required by the algorithm to solve a problem

• The number of execution steps usually depend on the configuration of the input data (i.e., on the instance of the problem)

• The complexity of an algorithm is usually expressed as a function of its input data, retaining the type of behavior while neglecting additive and multiplicative constants

• Example: O(n), O(n2), O(2n)



ISTITUTO DIE


ISTITUTO DIE

Equivalence• Two algorithms are said to be equivalent if:

– they are defined on the same domain

– they provide the same result in all domain points

• In general, there are many equivalent algorithms that solve the same problem, possibly providing different complexity

• The complexity is a property of an algorithm, it is not an inherent property of the problem

• The complexity of the most efficient knwon algorithm that solves a given problem is commonly considered to be the complexity of the problem

©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.

Documents

Transcript of ©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.