Statistics, Data Analysis, and Simulation – SS 2015 · 08.128.730 Statistik, Datenanalyse und ......

23
Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 1 / 23 Mainz, June 11, 2015 Statistics, Data Analysis, and Simulation – SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler <[email protected]>

Transcript of Statistics, Data Analysis, and Simulation – SS 2015 · 08.128.730 Statistik, Datenanalyse und ......

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 1 / 23

Mainz, June 11, 2015

Statistics, Data Analysis, andSimulation – SS 2015

08.128.730 Statistik, Datenanalyse undSimulation

Dr. Michael O. Distler<[email protected]>

Statistical hypothesis testing

So far: statistical analysis of a data sample in order to extractunknown parameters.Now we have prior assumptions about the value of thoseparameters⇒ a hypothesisWe need to check those hypotheses: the procedure is calledstatistical testCaveat: a test can never prove a hypothesis to be true.However, one can reject a hypothesis because of observations.The degree of statistical compatibility will be quantified usingconfidence limits.

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 2 / 23

The testing process

There is an initial research hypothesis of which the truth is unknown.

The first step is to state the relevant null and alternativehypotheses. This is important as mis-stating the hypotheseswill muddy the rest of the process.

The second step is to consider the statistical assumptions beingmade about the sample in doing the test; for example,assumptions about the statistical independence or about theform of the distributions of the observations. This is equallyimportant as invalid assumptions will mean that the results of thetest are invalid.

Decide which test is appropriate, and state the relevant teststatistic T.

Derive the distribution of the test statistic under the nullhypothesis from the assumptions. In standard cases this will bea well-known result. For example the test statistic might follow aStudent’s t distribution or a normal distribution.

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 3 / 23

The testing process

Select a significance level (α), a probability threshold belowwhich the null hypothesis will be rejected. Common values are5% and 1%.

The distribution of the test statistic under the null hypothesispartitions the possible values of T into those for which the nullhypothesis is rejected – the so-called critical region – and thosefor which it is not. The probability of the critical region is α.

Compute from the observations the observed value tobs of thetest statistic T.

Decide to either reject the null hypothesis in favor of thealternative or not reject it. The decision rule is to reject the nullhypothesis H0 if the observed value tobs is in the critical region,and to accept or “fail to reject” the hypothesis otherwise.

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 4 / 23

The testing process

An alternative process is commonly used:

1 Compute from the observations the observed value tobs ofthe test statistic T.

2 Calculate the p-value. This is the probability, under the nullhypothesis, of sampling a test statistic at least as extremeas that which was observed.

3 Reject the null hypothesis, in favor of the alternativehypothesis, if and only if the p-value is less than thesignificance level (the selected probability) threshold.

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 5 / 23

clairvoyance example

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 6 / 23

Chi-square distribution

If x1, x2, . . . , xn are independend random variables distributedaccording to the standard Gaussian distribution with mean 0and variance 1, then the sum

u = χ2 =n∑

i=1

x2i

ist distributed according to a χ2 distribution fn(u) = fn(χ2)where n is called the number of degrees of freedom.

fn(u) =12

(u2

)n/2−1 e−u/2

Γ(n/2)

The χ2 distribution has a maximum at (n− 2). The mean isfound to be n and the variance is 2n.

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 7 / 23

Chi-square distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10

pdf(2,x)pdf(3,x)pdf(4,x)pdf(5,x)pdf(6,x)pdf(7,x)pdf(8,x)pdf(9,x)

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 8 / 23

Chi-square cumulative distribution function

The probability for χ2n to take on a value in the interval [0, x ].

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

cdf(2,x)cdf(3,x)cdf(4,x)cdf(5,x)cdf(6,x)cdf(7,x)cdf(8,x)cdf(9,x)

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 9 / 23

Chi-square distribution with 5 d.o.f.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 2 4 6 8 10 12 14

95% c.l.

[0.831 ... 12.83]

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 10 / 23

Student’s t-test

A t-test is any statistical hypothesis test in which the teststatistic follows a Student’s t distribution if the null hypothesis issupported.A one-sample location test of whether the mean of a normallydistributed population has a value specified in a null hypothesis.

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 11 / 23

t-Verteilung

Die t-Verteilung tritt auf bei Tests der statistischenVerträglichkeit eines Stichproben-Mittelwertes x̄ mit einemvorgegebenen Mittelwert µ, oder der statistischenVerträglichkeit zweier Stichproben-Mittelwerte.Die Wahrscheinlichkeitsdichte der t-Verteilung ist gegebendurch

fn(t) =1√nπ

Γ((n + 1)/2)

Γ(n/2)

(1 +

t2

n

)−(n+1)/2

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 12 / 23

t-Verteilung

Die Studentschen t-Verteilungen f (t) (links) im Vergleich zurstandardisierten Gauß-Verteilung (gestrichelt) sowie dieintegrierten Studentschen t-Verteilungen

∫ t−∞ f (x)dx (rechts).

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 13 / 23

t-Verteilung

Quantile der t-Verteilung, P =∫ t−∞ fn(x)dx .

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 14 / 23

F -Verteilung

Gegeben sind n1 Stichprobenwerte einer Zufallsvariablen x undn2 Stichprobenwerte derselben Zufallsvariablen. Die besteSchätzung der Varianzen aus beiden Datenkollektionen seiens2

1 und s22. Die Zufallszahl

F =s2

1

s22

folgt dann einer F -Verteilung mit (n1,n2) Freiheitsgraden. Es istKonvention, dass F immer größer als eins ist.Die Wahrscheinlichkeitsdichte von F ist gegeben durch

f (F ) =

(n1

n2

)n1/2 Γ((n1 + n2)/2)

Γ(n1/2)Γ(n2/2)F (n1−2)/2

(1 +

n1

n2F)−(n1+n2)/2

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 15 / 23

Quantile der F -Verteilung, Konfidenz = 0.68

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 16 / 23

Quantile der F -Verteilung, Konfidenz = 0.90

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 17 / 23

Quantile der F -Verteilung, Konfidenz = 0.95

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 18 / 23

Quantile der F -Verteilung, Konfidenz = 0.99

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 19 / 23

5.3 Kolmogorov-Smirnov-Test

Dieser Test reagiert empfindlich auf Unterschiede in derglobalen Form oder in Tendenzen von Verteilungen. Dietheoretische Wahrscheinlichkeitsdichte f (x) und ihreVerteilungsfunktion F (x) =

∫ x−∞ f (x ′)dx ′ sei gegeben. Die xi

werden nach ihrer Größe geordnet und die kumulative Größegebildet:

Fn =Anzahl der xi -Werte ≤ x

nDie Testgröße ist

t =√

n ·max|Fn(x)− F (x)|

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 20 / 23

Kolmogorov-Smirnov-Test

Die Wahrscheinlichkeit P, einen Wert ≤ t0 für die Testgröße tzu erhalten, ist

P = 1− 2∞∑

k=1

(−1)k−1 · e−2k2t20

Werte für den praktischen Gebrauch:

P 1% 5% 50% 68% 95% 99% 99.9%

t0 0.44 0.50 0.83 0.96 1.36 1.62 1.95

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 21 / 23

Kolmogorov-Smirnov-Test

Beispiel: Die Daten 7, -1, 8, 5, 6 sollen einer Normalverteilungmit µ = 5 und σ = 2 entnommen worden sein. Für dieTestgröße ergibt sich t =

√5 ∗ 0.3 = 0.67.

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

−2 0 2 4 6 8 10 12

Ver

teilu

ngsf

unkt

ion

F(x

)

Zufallsvariable x

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 22 / 23

Dr. Michael O. Distler <[email protected]> Statistics, Data Analysis, and Simulation – SS 2015 23 / 23