Introduction to Statistical Hypothesis Testing

Probability and Statistics: Review - Part 1 References

Introduction to Statistical Hypothesis Testing

Arun K. Tangirala

Probability and Statistics: Review - Part 1

Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1


Learning objectives

I Probability

I Random variable

I Probability distribution function

I Probability density function (p.d.f.)



Random phenomenon

Random experiment (phenomenon)

An experiment or a phenomenon that produces different outcomes, even though it is repeated in thesame manner every time, is called a random experiment or a random phenomenon.

Examples: throw of a die, sensor reading, defects in a product

Prediction perspective:

Random phenomenon

Any phenomenon that cannot be predicted accurately given infinite past is said to be random.Alternatively, there exists no known mathematical function that can accurately describe the process.



Sample space / Population


It is the set of all possible outcomes of a random phenomenon. It is usually denoted by S.

I If the outcomes are discrete-valued, we have a discrete sample space (e.g., throw of a die, scoresin a game)

I When the outcomes are continuous-valued, we have a continuous sample space (e.g., ambienttemperature, gas pressure).

An event is a subset of the sample space

I In a two coin toss experiment, an event is E = {HT, TH} while S = {HH,HT, TH, TT}

A collection of sets (events) E1, E2, · · ·En

is said to be exhaustive if E1 [ E2 [ · · · [ En

= S.





It is the set of all possible outcomes of a random phenomenon. It is usually denoted by S.

I If the outcomes are discrete-valued, we have a discrete sample space (e.g., throw of a die, scoresin a game)

I When the outcomes are continuous-valued, we have a continuous sample space (e.g., ambienttemperature, gas pressure).

An event is a subset of the sample space

I In a two coin toss experiment, an event is E = {HT, TH} while S = {HH,HT, TH, TT}

A collection of sets (events) E1, E2, · · ·En

is said to be exhaustive if E1 [ E2 [ · · · [ En

= S.



Probability Basics and Axioms

Whenever dim(S) = N outcomes that are equally likely, the probability of each outcome is 1/N .

For a discrete sample space, the probability of an event E, denoted as P (E), is the sum of theprobabilities of the outcomes in E.

Axioms:

If S is the sample space and E is an event in any random experiment,

1. P (S) = 1 (one of the events has to occur!)

2. 0 P (E) 1 (probabilities are always non-negative values less than unity)

3. For two mutually exclusive events E1 and E2, P (E1 [ E2) = P (E1) + P (E2).

4. If E⇤ is the complement of an event E, P (E⇤) = 1� P (E).



Probability Basics and Axioms . . . contd.

Probabilities on sets:

1. P (A [B) = P (A) + P (B)� P (A \B)

2. P (A [B [ C) = P (A) + P (B) + P (C)� P (A \B)� P (B \ C)� P (C \A) + P (A \B \ C)

Conditional probability:

The conditional probability of an event B given an A s.t. P (A) > 0, denoted as P (B|A) is

P (B|A) =

P (A \B)

P (A)(1)



Probability Basics and Axioms . . . contd.

Probabilities on sets:

1. P (A [B) = P (A) + P (B)� P (A \B)

2. P (A [B [ C) = P (A) + P (B) + P (C)� P (A \B)� P (B \ C)� P (C \A) + P (A \B \ C)

Conditional probability:

The conditional probability of an event B given an A s.t. P (A) > 0, denoted as P (B|A) is

P (B|A) =

P (A \B)

P (A)

(1)



Example

Table below lists the classification of 940 wafers in a semiconductor manufacturing process.

Contamination Center Edge TotalLow 514 68 582High 112 246 358Total 626 314 940

If H is the event corresponding to high contamination and C is the event that the wafer is in the center of thesputtering tool, determine P (H [ C).

Answer: P (H [ C) = 872/940.



Independent events

Independence

Two events A and B are independent if and only if P (B|A) = P (B), i.e., P (A \B) = P (A)P (B).

A day’s production of 850 manufactured parts contains 50 parts that do not meet customer requirements. Supposetwo parts are selected from the batch, but the first part is replaced before the second part is selected. What isthe probability that the second part is defective (denoted as B) given that the first part is defective (denoted asA)?

Answer: P (B|A) = 50/850.



Random Variable . . . contd.

Definition (Random Variable)

A random variable X is a mapping (or point function) from the sample space ⌦ onto the real line suchthat to each element ! ⇢ ⌦ there corresponds a unique real number

I Effectively we replace our original (abstract) sample space by a new (concrete) sample space.

I E.g., head and tail of a toss are mapped to [1, 0]

I If the experiment itself yields some physical quantity that is real valued, then no further mapping is required

Randomness is not a characteristic of a process, but is rather a reflection of our (lack of) knowledge and

understanding of that process



Probability Distribution

The natural recourse to dealing with uncertainties is to list all possible outcomes and assign a chance toeach of those outcomes

Examples:

I Rainfall in a region: S = {0, 1}, P = {0.3, 0.7}

I Face value from the roll of a die: S = {1, 2, · · · , 6}, P (!) = {1/6} 8! 2 S

The specification of the outcomes and the associated probabilities completely characterizes the randomvariable.



Probability Distribution Function

In general, any random variable X is characterized by what is known as the probability or cumulative

distribution function F (x)

F (x) = Pr(X x)

I Probability distributions either are known upfront OR determined through experimentsI Out of 10000 coin tosses find the fraction (relative frequency) of the total tosses that contains heads -

this is (an estimate of) the probability of head occurring in any toss.



Types of distributions

IContinuous distributions

I F (x) is continuous and differentiable for almost all x (e.g., Gaussian).I The random variable in this case is a continuous quantity

(e.g., temperature, pressure, voltage)I For these distributions, a “density" function (like in mechanics) exists

IDiscrete (step-type) distributions

I F (x) is a simple step function with jumps at points (e.g., Binomial)I The random variable is then a purely discrete one.I No density function exists for this case. Instead a probability mass function that determines

Pr(X = x) is defined(e.g., counts of number of heads in a coin toss, face value on a die)

I The probability of taking on a value between jump points is zero

IMixed distributions: F (x) is both continuous and discrete



Probability Density Function

For continuous probability distributions, it is most convenient to work with probability densities (again asin mechanics)

The density function f(x) can be defined in two different ways

1. The density function is such that the area under the curve gives the probability,

Pr(a < x < b) =

Z b

a

f(x) dx (2)

Thus, Z 1

�1f(x) dx = 1

2. The density function is the derivative (w.r.t. x) of the distribution function

f(x) =dF (x)dx

(3)



Gaussian Density Function

One of the most frequently encountered and assumed distribution for a RV is the Gaussian (Normal)

distribution with p.d.f.:

f(x) =1

�p2⇡

exp

✓�1

2

(x� µ)2

�2

◆(4)

Remarks:

I Density is completely characterized by the twoparameters µ and �,

I The density function is symmetric . Allhigher-order (even) moments are zero.

I The Central Limit Theorem motivates wideusage of Gaussian p.d.f.

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

Gaussian density function

x

f(x)

Shaded region: Pr(1 X 2)



Uniform Distribution

A widely encountered distribution is the uniform distribution in the interval [a, b]

f(x) =1

b� a, a x b (5)

Remarks:

I Density is completely characterized by the twoparameters a and b

I Simplest of all the pdfs and is usually thestarting point in random number generators.

I Unlike the Gaussian distribution, thehigher-order moments are not zero.

-4 -2 0 2 4

0.08

0.10

0.12

0.14

0.16

Uniform density function

x

f(x)




Chi-square density function

Another popularly encountered distribution (of non-negative RVs) is the �2 p.d.f. with n d.o.f.:

fn

(x) =1

2

n/2�(n/2)

xn/2�1e�x/2 (6)

Remarks:

I Alternatively, when X =Pn

i=1 Z2i , where Zi’s

are independent Gaussian distributed RVs, Xis said to possess a Chi-square distribution.

I One of the most widely used in inferentialstatistics and hypothesis testing (of variances,spectral density, etc.).

I The mean and variance are n and 2nrespectively

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

0.10

Chi-square density function

x

f(x)




Binomial distribution function

Setup

A random experiment consists of n Bernoulli trials such that

1. The trials are independent.

2. Each trial has only two outcomes, typically labelled as “success” and “failure”.

3. The probability of success in each trial p remains constant.

Binomial random variable

Let the random variable X denote the number of successes x in these n trials. Then, X is said to be abinomial random variable with the probability mass function

P (X = x) = f(x) =

n

x

!px(1� p)n�x, x = 0, 1, · · · , n; 0 p 1 (7)



Binomial mass function

The binomial probability mass function is one of the most popular ones in probabilistic data analysis.

0 5 10 15 20

0.00

0.05

0.10

0.15

Mass function for Binomial RV (n=20, p = 0.5)

x

f(x)

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

Mass function for Binomial RV (n=20, p = 0.2)

x

f(x)

I Used extensively in computing probability of proportions (of success / failure, good / defective, etc.).



Which distribution to choose?

The type of distribution for a random phenomenon depends on the nature of the process and theoutcome (continuous- or discrete-valued).

Examples:

I Average of a random variables follow a Gaussian distribution.

I Sample variance, i.e., variance computed from a set of random observations follows a�2-distribution.

I Number of successes (or failures) in a series of trials follows a Binomial distribution.

I Number of events in a unit interval of time (or space) follows a Poisson distribution.



Commands in R

Every distribution that R handles has four functions for probability, quantile, density and random variable

(value), and has the same root name, but prefixed by p, q, d and r respectively

Few relevant functions:

Commands Distribution

rnorm, pnorm, qnorm, dnorm Gaussianrt, pt, qt, dt Student’s-trchisq, pchisq, qchisq, dchisq Chi-squarerunif, punif, qunif, dunif Uniform distributionrbinom, pbinom, qbinom, dbinom Binomial



Sample usage

x <- rnorm (1000, mean=20, sd=5)

hist(x, probability=TRUE)

xseq <- seq(min(x), max(x), length =200,col=’grey’)

lines(xseq , dnorm(xseq , mean=20, sd=5),col=’blue’,lwd =2)

Histogram of x

x

Density

5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

0.08



Bibliography I

Bendat, J. S. and A. G. Piersol (2010). Random Data: Analysis and Measurement Procedures. 4th edition. NewYork, USA: John Wiley & Sons, Inc.

Johnson, R. A. (2011). Miller and Freund’s: Probability and Statistics for Engineers. Upper Saddle River, NJ,USA: Prentice Hall.

Montgomery, D. C. and G. C. Runger (2011). Applied Statistics and Probability for Engineers. 5th edition. NewYork, USA: John Wiley & Sons, Inc.

Ogunnaike, B. A. (2010). Random Phenomena: Fundamentals of Probability and Statistics for Engineers. BocaRaton, FL, USA: CRC Press, Taylor & Francis Group.

Tangirala, A. K. (2014). Principles of System Identification: Theory and Practice. CRC Press, Taylor & FrancisGroup.


Introduction to Statistical Hypothesis Testing

Documents

Transcript of Introduction to Statistical Hypothesis Testing