Statistics, Data Analysis, and Simulation – SS 2015 · 08.128.730 Statistik, Datenanalyse und ......

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 1 / 23

Mainz, June 11, 2015

Statistics, Data Analysis, andSimulation – SS 2015

08.128.730 Statistik, Datenanalyse undSimulation

Dr. Michael O. Distler<[email protected]>

http://wwwa1.kph.uni-mainz.de/users/distler/

mailto:[email protected]



Statistical hypothesis testing

So far: statistical analysis of a data sample in order to extractunknown parameters.Now we have prior assumptions about the value of thoseparameters⇒ a hypothesisWe need to check those hypotheses: the procedure is calledstatistical testCaveat: a test can never prove a hypothesis to be true.However, one can reject a hypothesis because of observations.The degree of statistical compatibility will be quantified usingconfidence limits.




The testing process

There is an initial research hypothesis of which the truth is unknown.

The first step is to state the relevant null and alternativehypotheses. This is important as mis-stating the hypotheseswill muddy the rest of the process.

The second step is to consider the statistical assumptions beingmade about the sample in doing the test; for example,assumptions about the statistical independence or about theform of the distributions of the observations. This is equallyimportant as invalid assumptions will mean that the results of thetest are invalid.

Decide which test is appropriate, and state the relevant teststatistic T.

Derive the distribution of the test statistic under the nullhypothesis from the assumptions. In standard cases this will bea well-known result. For example the test statistic might follow aStudent’s t distribution or a normal distribution.




The testing process

Select a significance level (α), a probability threshold belowwhich the null hypothesis will be rejected. Common values are5% and 1%.

The distribution of the test statistic under the null hypothesispartitions the possible values of T into those for which the nullhypothesis is rejected – the so-called critical region – and thosefor which it is not. The probability of the critical region is α.

Compute from the observations the observed value tobs of thetest statistic T.

Decide to either reject the null hypothesis in favor of thealternative or not reject it. The decision rule is to reject the nullhypothesis H0 if the observed value tobs is in the critical region,and to accept or “fail to reject” the hypothesis otherwise.

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing




The testing process

An alternative process is commonly used:

1 Compute from the observations the observed value tobs ofthe test statistic T.

2 Calculate the p-value. This is the probability, under the nullhypothesis, of sampling a test statistic at least as extremeas that which was observed.

3 Reject the null hypothesis, in favor of the alternativehypothesis, if and only if the p-value is less than thesignificance level (the selected probability) threshold.




clairvoyance example




Chi-square distribution

If x1, x2, . . . , xn are independend random variables distributedaccording to the standard Gaussian distribution with mean 0and variance 1, then the sum

u = χ2 =n∑

i=1

x2i

ist distributed according to a χ2 distribution fn(u) = fn(χ2)where n is called the number of degrees of freedom.

fn(u) =12

(u2

)n/2−1 e−u/2

Γ(n/2)

The χ2 distribution has a maximum at (n− 2). The mean isfound to be n and the variance is 2n.




Chi-square distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10

pdf(2,x)pdf(3,x)pdf(4,x)pdf(5,x)pdf(6,x)pdf(7,x)pdf(8,x)pdf(9,x)




Chi-square cumulative distribution function

The probability for χ2n to take on a value in the interval [0, x ].

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

cdf(2,x)cdf(3,x)cdf(4,x)cdf(5,x)cdf(6,x)cdf(7,x)cdf(8,x)cdf(9,x)




Chi-square distribution with 5 d.o.f.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 2 4 6 8 10 12 14

95% c.l.

[0.831 ... 12.83]




Student’s t-test

A t-test is any statistical hypothesis test in which the teststatistic follows a Student’s t distribution if the null hypothesis issupported.A one-sample location test of whether the mean of a normallydistributed population has a value specified in a null hypothesis.




t-Verteilung

Die t-Verteilung tritt auf bei Tests der statistischenVerträglichkeit eines Stichproben-Mittelwertes x̄ mit einemvorgegebenen Mittelwert µ, oder der statistischenVerträglichkeit zweier Stichproben-Mittelwerte.Die Wahrscheinlichkeitsdichte der t-Verteilung ist gegebendurch

fn(t) =1√nπ

Γ((n + 1)/2)

Γ(n/2)

(1 +

t2

n

)−(n+1)/2




t-Verteilung

Die Studentschen t-Verteilungen f (t) (links) im Vergleich zurstandardisierten Gauß-Verteilung (gestrichelt) sowie dieintegrierten Studentschen t-Verteilungen

∫ t−∞ f (x)dx (rechts).




t-Verteilung

Quantile der t-Verteilung, P =∫ t−∞ fn(x)dx .




F -Verteilung

Gegeben sind n1 Stichprobenwerte einer Zufallsvariablen x undn2 Stichprobenwerte derselben Zufallsvariablen. Die besteSchätzung der Varianzen aus beiden Datenkollektionen seiens2

1 und s22. Die Zufallszahl

F =s2

1

s22

folgt dann einer F -Verteilung mit (n1,n2) Freiheitsgraden. Es istKonvention, dass F immer größer als eins ist.Die Wahrscheinlichkeitsdichte von F ist gegeben durch

f (F ) =

(n1

n2

)n1/2 Γ((n1 + n2)/2)

Γ(n1/2)Γ(n2/2)F (n1−2)/2

(1 +

n1

n2F)−(n1+n2)/2




Quantile der F -Verteilung, Konfidenz = 0.68




5.3 Kolmogorov-Smirnov-Test

Dieser Test reagiert empfindlich auf Unterschiede in derglobalen Form oder in Tendenzen von Verteilungen. Dietheoretische Wahrscheinlichkeitsdichte f (x) und ihreVerteilungsfunktion F (x) =

∫ x−∞ f (x ′)dx ′ sei gegeben. Die xi

werden nach ihrer Größe geordnet und die kumulative Größegebildet:

Fn =Anzahl der xi -Werte ≤ x

nDie Testgröße ist

t =√

n ·max|Fn(x)− F (x)|




Kolmogorov-Smirnov-Test

Die Wahrscheinlichkeit P, einen Wert ≤ t0 für die Testgröße tzu erhalten, ist

P = 1− 2∞∑

k=1

(−1)k−1 · e−2k2t20

Werte für den praktischen Gebrauch:

P 1% 5% 50% 68% 95% 99% 99.9%

t0 0.44 0.50 0.83 0.96 1.36 1.62 1.95




Kolmogorov-Smirnov-Test

Beispiel: Die Daten 7, -1, 8, 5, 6 sollen einer Normalverteilungmit µ = 5 und σ = 2 entnommen worden sein. Für dieTestgröße ergibt sich t =

√5 ∗ 0.3 = 0.67.

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

−2 0 2 4 6 8 10 12

Ver

teilu

ngsf

unkt

ion

F(x

)

Zufallsvariable x




Statistics, Data Analysis, and Simulation – SS 2015 · 08.128.730 Statistik, Datenanalyse und ......

Documents

Transcript of Statistics, Data Analysis, and Simulation – SS 2015 · 08.128.730 Statistik, Datenanalyse und ......