25
K. Desch – Statistical methods of data analysis SS10 2. Probability 2.3 Joint p.d.f.´s of several random variables ples: Experiment yields several simultaneous measurements . temperature and pressure) t p.d.f. (here only for 2 variables): y) dx dy = probability, that x[x,x+dx] and y[y,y+dy] alization: vidual probability distribution (“marginal p.d.f.”) for x and y: ds probability density for x (or y) independent of y (or x) d y are statistically independent if S f(x,y)dxdy 1 x f(x) f(x,y)dy y f(y) f(x,y)dx x y f(x,y) f(x)f(y) f(x |y) f(x) for any y (and vice versa)
• date post

21-Dec-2015
• Category

## Documents

• view

217

1

### Transcript of K. Desch – Statistical methods of data analysis SS10 2. Probability 2.3 Joint p.d.f.´s of several...

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

Examples: Experiment yields several simultaneous measurements(e.g. temperature and pressure)

Joint p.d.f. (here only for 2 variables):

f(x,y) dx dy = probability, that x[x,x+dx] and y[y,y+dy]

Normalization:

Individual probability distribution (“marginal p.d.f.”) for x and y:

yields probability density for x (or y) independent of y (or x)

x and y are statistically independent if

Sf(x,y) dxdy 1

xf (x) f(x,y) dy

yf (y) f(x,y) dx

x yf(x,y) f (x) f (y)

f(x | y) f(x) for any y (and vice versa)

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

conditional p.d.f.´s:

h(y|x)dxdy is the probability for event to lie in the interval [y,y+dy]when the event is known to lie in the interval [x,x+dx].

x

f(x,y)h(y | x)

f (x)

y

f(x,y)g(x | y)

f (y)

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

Example: measurement of the length of a bar and its temperature

x = deviation from 800 mmy = temperature in 0C

a) 2-dimentional histogram (“scatter-plot”)

b) Marginal distribution of y (“y-projection”)

c) Marginal distribution of x (“x-projection”)

d) 2 conditional distributions of x (s. edges in (a))

Width in d) smaller than in c)

x and y are “correlated”

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

Expectation value (analog to 1-dim. case)

n1E[a(x)] a(x) f(x)dx ...dx

Variance (analog to 1-dim. case)

2 2a a n1

V[a(x)] (a(x) ) f(x)dx ...dx

Covariance

for 2 variables x, y with joint p.d.f. f(x,y):

important when more than one variable: measure for the correlation of the variables:

xy x ycov[x,y] V : E[(x )(y )]

x yE[xy]

x y... xy f(x,y)dxdy

if x, y are stat. independent (f(x,y) = fx(x)fy(y)) then cov[x,y] = 0(but not vice versa!!)

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

Positive correlation: positive (negative) deviation of xfrom its x increases the probability, that y has a positive (negative) deiationof its y

For the sum of random numbers x+y holds: V[x+y] = V[x] + V[y] + 2 cov[x,y](proof: linearity of E[])

For n random variables xi i=1,n: is the covariance matrix (symmetric matrix)

diagonal elements:

For uncorrelated variables: covariance matrix is diagonal

For all elements:

i

2i i i xcov[x ,x ] V[x ]

i ji j x xcov[x ,x ] V

i ji j x xcov[x ,x ]

Normalized quantity: is the correlation coefficient i j

i j

i jx x

x x

cov[x ,x ]:

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

examples for correlation coefficients

(Axis units play no role !)

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

one more example:

[Barlow]

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.3 Joint p.d.f.´s of several random variables

another example:

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.4 Transformation of variables

Measured quantity: x (distributed according to pdf f(x))

Derived quanitity: y = a(x) What is the p.d.f. of y, g(y) ?

Define g(y) by requiring the same probability for

y

x

[y,y+dy]

[x,x+dx] =: dS

a(x)

dS

g(y)dy f(x)dx

dxg(y) f(x(y))

dy

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.4 Transformation of variables

More tedious when x y not a 11 relation, e.g.2xy(x)

y

[y,y+dy]

;y2

1

dy

dx )yf()yf(

y2

1g(y)

two branches x>0 and x<0

for g(y) sum up the probabilitiesfor x>0 and x<0

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.Transformation of variables

Functions of more variables :

transformation through the Jacobian matrix:

g(y) f(x(y))det J

1 1

1 n

n n

1 n

x x

y y

J

x x

y y

y a(x)

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.4 Transformation of variables

Example: Gaussian momentum distributionMomentum in x and y:

polar coordinates

x = r cos φ

y = r sin φ

r2 := x2 + y2

2xef(x) 2yef(y)

22 yxef(x)

cos rsin

sin rcosy

r

y

x

r

x

J

det J = r → g (r,φ) = f ( x (r,φ), y (r,φ) ) • det J = re2r

In 3-dimenions → Maxwell distribution

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.5 Error propagation

Often, one is not interested in complete transformation of p.d.f. but onlyin the transformation of its variance (=squared error) measured error of x derived error of y

When σx is small relative to curvature of y(x) :

→ linear approach

E y(x) y(μ)

n

i ii 1 i x

yy(x) y( ) (x )

x

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.5 Error propagation

Variance : 2) )μy(-)xy( (E )xy(V

2n

1iiiμx

i

)μ(x|x

yE

n

1jjjμx

j

n

1iiiμx

i

)μ(x|x

y)μ(x|

x

yE

n n2y ij

i 1 j 1 i j

y yV[y] V

x x→

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.5 Error propagation

For more variables yi:

n nk l

k l i ji 1 j 1 i j

y ycov[y ,y ] cov[x ,x ]

x x

→ general formula for error propagation (in linear approximation)

Special cases:

a) uncorrelated xj :

and

|x

y )xy(V

n

1iμx

2x

2

ii

|σx

y

x

y y,ycov

n

1iμx

2x

i

l

i

klk i

even if xi are uncorrelated, the yi are in general correlated

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.5 Error propagation

b) Sum y = x1 + x2 → 2x

2x

2y 21

σσσ

c) Product y = x1x2 → 2x

2x

2x

2x

2y

2y

2y 2

2

1

1

μ

σ

μ

σ

μ

σ

μ

v[y]

x1 and x2 are uncorrelated !

K. Desch – Statistical methods of data analysis SS10

2. Probability 2.5 Convolution

Convolution :

Typical case when a probability distribution consists of two random variables x, y like a sum w = x + y. w is also a random variable

Example: x: Breit-Wigner Resonance

y: Exp. Resolution (Gauss)

What is the p.d.f. for w when fx(x) and fy(y) are known

w x yf (w) f (x)f (y)δ(w x y)dxdy

x)dx(w(x)ff yx

y)dy(w(y)ff xy

yx ff

xy

3. Distributions

Important probability distributions

- Binominal distribution

- Poisson distribution

- Gaussian distribution

- Cauchy (Breit-Wigner) distribution

- Chi-squared distribution

- Landau distribution

- Uniform distribution

Central limit theorem

3. Distributions 3.1 Binomial distribution

Binomial distribution appears when one has exactly two possible trial outcomes (success-failure, head-tail, even-odd, …)

event “success”: event “failure”:

Probability:

A A

p P(A) q (1 p) P(A)

Example: (ideal) coins

Probability for “head” (A) = p = 0.5, q=0.5

Probability for n=4 trials to get k-time “head” (A) ?

k=0: P = (1-p)4 = 1/16k=1: P = (p (1-p)3) times number of combinations (HTTT, THTT, TTHT, TTTH) = 4*1/16 = ¼k=2: P = (p2 (1-p)2) times (HHTT, HTTH, TTHH, HTHT, THTH, THHT) = 6*1/16 = 3/8 k=3: P = (p3 (1-p)) times (HHHT, HHTH, HTHH, THHH) = 4*1/15 = ¼k=4: P = p4 = 1/16

P(0)+P(1)+P(2)+P(3)+P(4) = 1/16+1/4+3/8+1/4+1/16 = 1 ok

3. Distributions 3.1 Binomial distribution

Number of permutations for k successes by n trials:

Binominal coefficient:

Binomial distribution:

- Discrete probability distribution

- Random variable: k

- Depends on 2 parameters: n (number of attempts) and p (probability of suc.)

- Sequence of appearance of k successes play no role

- n trials must be independent

n n!

k k!(n k)!

k n k n

f(k;n,p) p (1 p)k

3. Distributions 3.1 Binomial distribution (properties)

Normalisation:

Expectation value(mean value):

Proof:

n nk n k n

k 0 k 0

nf(k;n,p) p (1 p) (p (1 p)) 1

k

n nk n k k 1 n k

k 0 k 1

nk 1 n k

k 1

nk n k

k 0

n! (n 1)!kp (1 p) np kp (1 p)

k!(n k)! k!(n k)!

(n 1)!np p (1 p)

(k 1)!(n k)!

n !np p (1 p) np (mit n n 1,k k 1)

k !(n k )!

npp)n,kf(k;kE[k]k

3. Distributions 3.1 Binomial distribution (properties)

Variance:

Proof:

nk n k

k 0

n2 k 2 n k

k 2

n2 k n k 2

k 0

n!k(k 1)p (1 p)

k!(n k)!

(n 2)!p n(n 1) p (1 p)

(k 2)!(n k)!

n !p n(n 1) p (1 p) p n(n 1) (mit n n 2,k k 2)

k !(n k )!

k(k 1)

However:

2 2 2 2 2 2

2 22 2 2 2 2

V[k] E[k ] E[k] E[k ] E[k] E[k] E[k] E[k k] E[k] E[k ]

k k k k p n(n 1) np n p np(k 1k p)

p)-np(1p)n,f(k;μ)-(kσV[k]k

22

3. Distributions 3.1 Binomial distribution

3. Distributions 3.1 Binomial distribution

HERA-B experiment muon spectrometer

12 chambers; efficiency of one chamber is ε = 95%Trigger condition: 11 out of 12 chambers hit

εTOTAL = P(11; 12,0.95) + P(12; 12,0.95) = 88.2 %

When chambers reach only ε = 90% then εTOTAL = 65.9%

When one chambers fails: εTOTAL = P(11, 0.95, 12) = 56.9 %

Random coincidences (noise): εBG = 10% 20% - twice more noise εTOTAL_BG = 1•10-9 2•10-7 200x more background

x x x x

xxxxxxx

μ

3. Distributions 3.1 Binomial distribution

Example: number of error bars in 1-interval (p=0.68)