Introduction to Statistical Hypothesis Testing

27
Probability and Statistics: Review - Part 1 References Introduction to Statistical Hypothesis Testing Arun K. Tangirala Probability and Statistics: Review - Part 1 Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1

Transcript of Introduction to Statistical Hypothesis Testing

Page 1: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Introduction to Statistical Hypothesis Testing

Arun K. Tangirala

Probability and Statistics: Review - Part 1

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1

Page 2: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Learning objectives

I Probability

I Random variable

I Probability distribution function

I Probability density function (p.d.f.)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 2

Page 3: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Random phenomenon

Random experiment (phenomenon)

An experiment or a phenomenon that produces different outcomes, even though it is repeated in thesame manner every time, is called a random experiment or a random phenomenon.

Examples: throw of a die, sensor reading, defects in a product

Prediction perspective:

Random phenomenon

Any phenomenon that cannot be predicted accurately given infinite past is said to be random.Alternatively, there exists no known mathematical function that can accurately describe the process.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 3

Page 4: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Sample space / Population

Sample space / Population

It is the set of all possible outcomes of a random phenomenon. It is usually denoted by S.

I If the outcomes are discrete-valued, we have a discrete sample space (e.g., throw of a die, scoresin a game)

I When the outcomes are continuous-valued, we have a continuous sample space (e.g., ambienttemperature, gas pressure).

An event is a subset of the sample space

I In a two coin toss experiment, an event is E = {HT, TH} while S = {HH,HT, TH, TT}

A collection of sets (events) E1, E2, · · ·En

is said to be exhaustive if E1 [ E2 [ · · · [ En

= S.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 4

Page 5: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Sample space / Population

Sample space / Population

It is the set of all possible outcomes of a random phenomenon. It is usually denoted by S.

I If the outcomes are discrete-valued, we have a discrete sample space (e.g., throw of a die, scoresin a game)

I When the outcomes are continuous-valued, we have a continuous sample space (e.g., ambienttemperature, gas pressure).

An event is a subset of the sample space

I In a two coin toss experiment, an event is E = {HT, TH} while S = {HH,HT, TH, TT}

A collection of sets (events) E1, E2, · · ·En

is said to be exhaustive if E1 [ E2 [ · · · [ En

= S.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 5

Page 6: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Basics and Axioms

Whenever dim(S) = N outcomes that are equally likely, the probability of each outcome is 1/N .

For a discrete sample space, the probability of an event E, denoted as P (E), is the sum of theprobabilities of the outcomes in E.

Axioms:

If S is the sample space and E is an event in any random experiment,

1. P (S) = 1 (one of the events has to occur!)

2. 0 P (E) 1 (probabilities are always non-negative values less than unity)

3. For two mutually exclusive events E1 and E2, P (E1 [ E2) = P (E1) + P (E2).

4. If E⇤ is the complement of an event E, P (E⇤) = 1� P (E).

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 6

Page 7: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Basics and Axioms

Whenever dim(S) = N outcomes that are equally likely, the probability of each outcome is 1/N .

For a discrete sample space, the probability of an event E, denoted as P (E), is the sum of theprobabilities of the outcomes in E.

Axioms:

If S is the sample space and E is an event in any random experiment,

1. P (S) = 1 (one of the events has to occur!)

2. 0 P (E) 1 (probabilities are always non-negative values less than unity)

3. For two mutually exclusive events E1 and E2, P (E1 [ E2) = P (E1) + P (E2).

4. If E⇤ is the complement of an event E, P (E⇤) = 1� P (E).

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 7

Page 8: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Basics and Axioms . . . contd.

Probabilities on sets:

1. P (A [B) = P (A) + P (B)� P (A \B)

2. P (A [B [ C) = P (A) + P (B) + P (C)� P (A \B)� P (B \ C)� P (C \A) + P (A \B \ C)

Conditional probability:

The conditional probability of an event B given an A s.t. P (A) > 0, denoted as P (B|A) is

P (B|A) =

P (A \B)

P (A)(1)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 8

Page 9: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Basics and Axioms . . . contd.

Probabilities on sets:

1. P (A [B) = P (A) + P (B)� P (A \B)

2. P (A [B [ C) = P (A) + P (B) + P (C)� P (A \B)� P (B \ C)� P (C \A) + P (A \B \ C)

Conditional probability:

The conditional probability of an event B given an A s.t. P (A) > 0, denoted as P (B|A) is

P (B|A) =

P (A \B)

P (A)

(1)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 9

Page 10: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Example

Table below lists the classification of 940 wafers in a semiconductor manufacturing process.

Contamination Center Edge TotalLow 514 68 582High 112 246 358Total 626 314 940

If H is the event corresponding to high contamination and C is the event that the wafer is in the center of thesputtering tool, determine P (H [ C).

Answer: P (H [ C) = 872/940.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 10

Page 11: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Independent events

Independence

Two events A and B are independent if and only if P (B|A) = P (B), i.e., P (A \B) = P (A)P (B).

A day’s production of 850 manufactured parts contains 50 parts that do not meet customer requirements. Supposetwo parts are selected from the batch, but the first part is replaced before the second part is selected. What isthe probability that the second part is defective (denoted as B) given that the first part is defective (denoted asA)?

Answer: P (B|A) = 50/850.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 11

Page 12: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Independent events

Independence

Two events A and B are independent if and only if P (B|A) = P (B), i.e., P (A \B) = P (A)P (B).

A day’s production of 850 manufactured parts contains 50 parts that do not meet customer requirements. Supposetwo parts are selected from the batch, but the first part is replaced before the second part is selected. What isthe probability that the second part is defective (denoted as B) given that the first part is defective (denoted asA)?

Answer: P (B|A) = 50/850.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 12

Page 13: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Random Variable . . . contd.

Definition (Random Variable)

A random variable X is a mapping (or point function) from the sample space ⌦ onto the real line suchthat to each element ! ⇢ ⌦ there corresponds a unique real number

I Effectively we replace our original (abstract) sample space by a new (concrete) sample space.

I E.g., head and tail of a toss are mapped to [1, 0]

I If the experiment itself yields some physical quantity that is real valued, then no further mapping is required

Randomness is not a characteristic of a process, but is rather a reflection of our (lack of) knowledge and

understanding of that process

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 13

Page 14: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Distribution

The natural recourse to dealing with uncertainties is to list all possible outcomes and assign a chance toeach of those outcomes

Examples:

I Rainfall in a region: S = {0, 1}, P = {0.3, 0.7}

I Face value from the roll of a die: S = {1, 2, · · · , 6}, P (!) = {1/6} 8! 2 S

The specification of the outcomes and the associated probabilities completely characterizes the randomvariable.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 14

Page 15: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Distribution Function

In general, any random variable X is characterized by what is known as the probability or cumulative

distribution function F (x)

F (x) = Pr(X x)

I Probability distributions either are known upfront OR determined through experimentsI Out of 10000 coin tosses find the fraction (relative frequency) of the total tosses that contains heads -

this is (an estimate of) the probability of head occurring in any toss.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 15

Page 16: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Types of distributions

IContinuous distributions

I F (x) is continuous and differentiable for almost all x (e.g., Gaussian).I The random variable in this case is a continuous quantity

(e.g., temperature, pressure, voltage)I For these distributions, a “density" function (like in mechanics) exists

IDiscrete (step-type) distributions

I F (x) is a simple step function with jumps at points (e.g., Binomial)I The random variable is then a purely discrete one.I No density function exists for this case. Instead a probability mass function that determines

Pr(X = x) is defined(e.g., counts of number of heads in a coin toss, face value on a die)

I The probability of taking on a value between jump points is zero

IMixed distributions: F (x) is both continuous and discrete

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 16

Page 17: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Types of distributions

IContinuous distributions

I F (x) is continuous and differentiable for almost all x (e.g., Gaussian).I The random variable in this case is a continuous quantity

(e.g., temperature, pressure, voltage)I For these distributions, a “density" function (like in mechanics) exists

IDiscrete (step-type) distributions

I F (x) is a simple step function with jumps at points (e.g., Binomial)I The random variable is then a purely discrete one.I No density function exists for this case. Instead a probability mass function that determines

Pr(X = x) is defined(e.g., counts of number of heads in a coin toss, face value on a die)

I The probability of taking on a value between jump points is zero

IMixed distributions: F (x) is both continuous and discrete

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 17

Page 18: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Probability Density Function

For continuous probability distributions, it is most convenient to work with probability densities (again asin mechanics)

The density function f(x) can be defined in two different ways

1. The density function is such that the area under the curve gives the probability,

Pr(a < x < b) =

Z b

a

f(x) dx (2)

Thus, Z 1

�1f(x) dx = 1

2. The density function is the derivative (w.r.t. x) of the distribution function

f(x) =dF (x)dx

(3)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 18

Page 19: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Gaussian Density Function

One of the most frequently encountered and assumed distribution for a RV is the Gaussian (Normal)

distribution with p.d.f.:

f(x) =1

�p2⇡

exp

✓�1

2

(x� µ)2

�2

◆(4)

Remarks:

I Density is completely characterized by the twoparameters µ and �,

I The density function is symmetric . Allhigher-order (even) moments are zero.

I The Central Limit Theorem motivates wideusage of Gaussian p.d.f.

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

Gaussian density function

x

f(x)

Shaded region: Pr(1 X 2)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 19

Page 20: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Uniform Distribution

A widely encountered distribution is the uniform distribution in the interval [a, b]

f(x) =1

b� a, a x b (5)

Remarks:

I Density is completely characterized by the twoparameters a and b

I Simplest of all the pdfs and is usually thestarting point in random number generators.

I Unlike the Gaussian distribution, thehigher-order moments are not zero.

-4 -2 0 2 4

0.08

0.10

0.12

0.14

0.16

Uniform density function

x

f(x)

Shaded region: Pr(1 X 2)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 20

Page 21: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Chi-square density function

Another popularly encountered distribution (of non-negative RVs) is the �2 p.d.f. with n d.o.f.:

fn

(x) =1

2

n/2�(n/2)

xn/2�1e�x/2 (6)

Remarks:

I Alternatively, when X =Pn

i=1 Z2i , where Zi’s

are independent Gaussian distributed RVs, Xis said to possess a Chi-square distribution.

I One of the most widely used in inferentialstatistics and hypothesis testing (of variances,spectral density, etc.).

I The mean and variance are n and 2nrespectively

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

0.10

Chi-square density function

x

f(x)

Shaded region: Pr(6 X 8)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 21

Page 22: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Binomial distribution function

Setup

A random experiment consists of n Bernoulli trials such that

1. The trials are independent.

2. Each trial has only two outcomes, typically labelled as “success” and “failure”.

3. The probability of success in each trial p remains constant.

Binomial random variable

Let the random variable X denote the number of successes x in these n trials. Then, X is said to be abinomial random variable with the probability mass function

P (X = x) = f(x) =

n

x

!px(1� p)n�x, x = 0, 1, · · · , n; 0 p 1 (7)

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 22

Page 23: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Binomial mass function

The binomial probability mass function is one of the most popular ones in probabilistic data analysis.

0 5 10 15 20

0.00

0.05

0.10

0.15

Mass function for Binomial RV (n=20, p = 0.5)

x

f(x)

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

Mass function for Binomial RV (n=20, p = 0.2)

x

f(x)

I Used extensively in computing probability of proportions (of success / failure, good / defective, etc.).

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 23

Page 24: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Which distribution to choose?

The type of distribution for a random phenomenon depends on the nature of the process and theoutcome (continuous- or discrete-valued).

Examples:

I Average of a random variables follow a Gaussian distribution.

I Sample variance, i.e., variance computed from a set of random observations follows a�2-distribution.

I Number of successes (or failures) in a series of trials follows a Binomial distribution.

I Number of events in a unit interval of time (or space) follows a Poisson distribution.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 24

Page 25: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Commands in R

Every distribution that R handles has four functions for probability, quantile, density and random variable

(value), and has the same root name, but prefixed by p, q, d and r respectively

Few relevant functions:

Commands Distribution

rnorm, pnorm, qnorm, dnorm Gaussianrt, pt, qt, dt Student’s-trchisq, pchisq, qchisq, dchisq Chi-squarerunif, punif, qunif, dunif Uniform distributionrbinom, pbinom, qbinom, dbinom Binomial

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 25

Page 26: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Sample usage

x <- rnorm (1000, mean=20, sd=5)

hist(x, probability=TRUE)

xseq <- seq(min(x), max(x), length =200,col=’grey’)

lines(xseq , dnorm(xseq , mean=20, sd=5),col=’blue’,lwd =2)

Histogram of x

x

Density

5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

0.08

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 26

Page 27: Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Bibliography I

Bendat, J. S. and A. G. Piersol (2010). Random Data: Analysis and Measurement Procedures. 4th edition. NewYork, USA: John Wiley & Sons, Inc.

Johnson, R. A. (2011). Miller and Freund’s: Probability and Statistics for Engineers. Upper Saddle River, NJ,USA: Prentice Hall.

Montgomery, D. C. and G. C. Runger (2011). Applied Statistics and Probability for Engineers. 5th edition. NewYork, USA: John Wiley & Sons, Inc.

Ogunnaike, B. A. (2010). Random Phenomena: Fundamentals of Probability and Statistics for Engineers. BocaRaton, FL, USA: CRC Press, Taylor & Francis Group.

Tangirala, A. K. (2014). Principles of System Identification: Theory and Practice. CRC Press, Taylor & FrancisGroup.

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 27