Download - Stat 155, Section 2, Last Time Big Rules of Probability: –Not Rule ( 1 – P{opposite}) –Or Rule (glasses – football) –And rule (multiply conditional prob’s)

Stat 155, Section 2, Last Time• Big Rules of Probability:

– Not Rule ( 1 – P{opposite})– Or Rule (glasses – football)– And rule (multiply conditional prob’s)– Use in combination for real power

• Bayes Rule– Turn around conditional probabilities– Write hard ones in terms of easy ones– Recall surprising disease testing result

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 266-271, 311-323, 277-286

Approximate Reading for Next Class:

Pages 291-305, 334-351

Midterm I

Coming up: Tuesday, Feb. 27

Material: HW Assignments 1 – 6

Extra Office Hours:

Mon. Feb. 26, 8:30 – 12:00, 2:00 – 3:30

(Instead of Review Session)

Bring Along:

1 8.5” x 11” sheet of paper with formulas

Recall Pepsi Challenge

In class taste test:

• Removed bias with randomization

• Double blind approach

• Asked which was:

– Better

– Sweeter

– which


Results summarized in spreadsheet

Eyeball impressions:

a. Perhaps no consensus preference

between Pepsi and Coke?

– Is 54% "significantly different from 50%? (will

develop methods to understand this)

– Result of "marketing research"???

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls


b. Perhaps no consensus as to which is

sweeter?

• Very different from the past, when Pepsi was

noticeably sweeter

• This may have driven old Pepsi challenge

phenomenon

• Coke figured this out, and matched Pepsi in

sweetness


c. Most people believe they know

– Serious cola drinkers, because now flavor driven

– In past, was sweetness driven, and there were many

advertising caused misperceptions!

d. People tend to get it right or not??? (less clear)

– Overall 71% right. Seems like it, but again is that

significantly different from 50%?


e. Those who think they know tend to be right???

– People who thought they knew: right 71% of the

time

f. Those who don't think they know seem to right as

well. Wonder why?

– People who didn't: also right 70% of time? Why?

"Natural sampling variation"???

– Any difference between people who thought they

knew, and those who did not think so?


g. Coin toss was fair (or is 57% heads significantly

different from %50?)

How accurate are those ideas?

• Will build tools to assess this

• Called “hypo tests” and “P-values”

• Revisit this example later

Independence

(Need one more major concept at this level)

An event A does not depend on B, when:

Knowledge of B does not change

chances of A:

P{A | B} = P{A}

Independence

E.g. I Toss a Coin, and somebody on South

Pole does too.

P{H(me) | T(SP)} = P{H(me)} = ½.

(no way that can matter, i.e. independent)

Independence

E.g. I Toss a Coin twice:

(toss number indicated with

subscript)

• Is it < ½?

• What if have 5 Heads in a row?

(isn’t it more likely to get a Tail?)

(Wanna bet?!?)

???| 12 HHP

Independence

E.g. I Toss a Coin twice, …

Rational approach:

Look at Sample Space

Model all as equally likely

Then:

So independence is good model for coin tosses

21

21

21

21

TT

HT

TH

HH

2

1

1212 2

1

2/1

4/1&| HP

HP

HHPHHP

New Ball & Urn Example

H R R R R G G T R R G

Again toss coin, and draw ball:

Same, so R & H are independent events

Not true above, but works here, since proportions of

R & G are same

3

2| HRP

0|| TPTRPHPHRPRP

3

2

2

1

3

2

2

1

3

2

Independence

Note, when A is independent of B:

so

And thus

i.e. B is independent of A

BAPBPAP &}{

BPBAP

BAPAP&

|

ABPAPBAP

BP |&

}{

Independence

Note, when A in independent of B:

It follows that: B is independent of A

I.e. “independence” is symmetric in A and B

(as expected)

More formal treatments use symmetric version

as definition

(to avoid hassles with 0 probabilities)

Independence

HW:

4.31

Special Case of “And” Rule

For A and B independent:

P{A & B} = P{A | B} P{B} = P{B | A} P{A} =

= P{A} P{B}

i.e. When independent, just multiply probabilities…

Textbook: Call this another rule

Me: Only learn one, this is a special case

Independent “And” Rule

E.g. Toss a coin until the 1st Head appears,

find P{3 tosses}:

Model: tosses are independent

(saw this was reasonable last time, using

equally likely sample space ideas)

P{3 tosses} =

When have 3: group with parentheses

321 && HTTP



find P{3 tosses}

(by indep:)

I.e. “just multiply”

321 && HTTP 321 && HTTP

21213 &&| TTPTTHP

1123 | TPTTPHP

123 TPTPHP



P{3 tosses}

• Multiplication idea holds in general

• So from now on will just say:

“Since Independent, multiply probabilities”

• Similarly for Exclusive Or rule,

Will just “add probabilities”

123 TPTPHP


HW:

4.29 (hint: Calculate

P{G1&G2&G3&G4&G5&G6&G7})

4.33

Overview of Special Cases

Careful: these can be tricky to keep separate

OR works like adding,

for mutually exclusive

AND works like multiplying,

for independent


Caution: special cases are different

Mutually exclusive independent

For A and B mutually exclusive:

P{A | B} = 0 P{A}

Thus not independent


HW: C15 Suppose events A, B, C all have

probability 0.4, A & B are independent,

and A & C are mutually exclusive.

(a) Find P{A or B} (0.64)

(b) Find P{A or C} (0.8)

(c) Find P{A and B} (0.16)

(d) Find P{A and C} (0)

Random Variables

Text, Section 4.3 (we are currently jumping)

Idea: take probability to next level

Needed for probability structure of political

polls, etc.

Random Variables

Definition:

A random variable, usually denoted as X,

is a quantity that

“takes on values at random”

Random Variables

Two main types

(that require different mathematical models)

• Discrete, i.e. counting

(so look only at “counting numbers”, 1,2,3,…)

• Continuous, i.e. measuring

(harder math, since need all fractions, etc.)

Random Variables

E.g: X = # for Candidate A in a randomly

selected political poll: discrete

(recall all that means)

Power of the random variable idea:

• Gives something to “get a hold of…”

• Similar in spirit to high school algebra…

High School Algebra

Recall Main Idea?

Rules for solving equations???

No, major breakthrough is:

• Give unknown(s) a name

• Find equation(s) with unknown

• Solve equation(s) to find unknown(s)

Random Variables

E.g: X = # that comes up, in die rolling:

Discrete

• But not very interesting

• Since can study by simple methods

• As done above

• Don’t really need random variable concept

Random Variables

E.g: Measurement error:

Let X = measurement:

Continuous

• How to model probabilities???

Random Variables

HW on discrete vs. continuous:

4.40 ((b) discrete, (c) continuous, (d)

could be either, but discrete is more

common)

And now for something completely different

My idea about “visualization” last time:

• 30% really liked it

• 70% less enthusiastic…

• Depends on mode of thinking– “Visual thinkers” loved it

– But didn’t connect with others

• So hadn’t planned to continue that…


But here was another viewpoint:

Professor Marron,

Could you focus on something more intelligent in your "And now for something completely different" section once every two weeks, perhaps, instead of completely abolishing it? I really enjoyed your discussion of how to view three dimensions in 2-D today.


A fun example:

• Faces as data

• Each data point is a digital image

• Data from U. Carlos, III in Madrid

(hard to do here for confidentiality reasons)

Q: What distinguishes men from women?


Context: statistical problem of “classification”, i.e. “discrimination”

Basically “automatic disease diagnosis”:

• Have measurm’ts on sick & healthy cases

• Given new person, make measm’ts

• Closest to sick or healthy populations?


Approach: Distance Weight Discrimination

(Marron & Todd)

Idea: find “best separating direction” in high dimensional data space

Here:

• Data are images

• Classes: Male & Females

• Given new image: classify make - female


Fun visualization:

• March through point clouds

• Along separating direction

• Captures “Femaleness” & “Maleness”

• Note relation to “training data”

Random VariablesA die rolling example

(where random variable concept is useful)

Win $9 if 5 or 6, Pay $4, if 1, 2 or 3, otherwise (4) break even

Notes:

• Don’t care about number that comes up

• Random Variable abstraction allows focusing on important points

• Are you keen to play? (will calculate…)

Random Variables

Die rolling example

Win $9 if 5 or 6, Pay $4, if 1, 2 or 4

Let X = “net winnings”

Note: X takes on values 9, -4 and 0

Probability Structure of X is summarized by:

P{X = 9} = 1/3 P{X = -4} = 1/2 P{X = 0} = 1/6

(should you want to play?, study later)

Random Variables

Die rolling example, for X = “net winnings”:

Win $9 if 5 or 6, Pay $4, if 1, 2 or 4

Probability Structure of X is summarized by:

P{X = 9} = 1/3 P{X = -4} = 1/2 P{X = 0} = 1/6

Convenient form: a table

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Summary of Prob. Structure

In general: for discrete X, summarize “distribution” (i.e. full prob. Structure) by a table:

Where:

i. All are between 0 and 1

ii. (so get a prob. funct’n as above)

Values x1 x2 … xk

Prob. p1 p2 … pk

11

k

iip

ip


Summarize distribution, for discrete X,

by a table:

Power of this idea:

• Get probs by summing table values

• Special case of disjoint OR rule

Values x1 x2 … xk

Prob. p1 p2 … pk


E.g. Die Rolling game above:

P{X = 9} = 1/3

P{X < 2} = P{X = 0} + P{X = -4} =1/6+1/2 = 2/3

P{X = 5} = 0 (not in table!)

Winning 9 -4 0

Prob. 1/3 1/2 1/6


E.g. Die Rolling game above:Winning 9 -4 0

Prob. 1/3 1/2 1/6

0

0&90|9

XPXXP

XXP

3

2

2131

31

6131

09

XPXP


HW:

4.41 & (c) Find P{X = 3 | X >= 2} (3/7)

4.52 (0.144, …, 0.352)

Probability Histogram

Idea: Visualize probability distribution using a

bar graph

E.g. Die Rolling game above:Winning 9 -4 0

Prob. 1/3 1/2 1/6Toy example probability histogram

0

0.10.2

0.3

0.40.5

0.6

-4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

X values

pro

bab

ilit

y


Construction in Excel:

• Very similar to bar graphs (done before)

• Bar heights = probabilities

• Example: Class Example 18

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg18.xls


HW:

4.43

Random Variables

Now consider continuous random variables

Recall: for measurements (not counting)

Model for continuous random variables:

Calculate probabilities as areas,

under “probability density curve”, f(x)

Continuous Random Variables

Model probabilities for continuous random

variables, as areas under “probability

density curve”, f(x):

= Area( )

a b

(calculus notation)

bXaP

b

a

dxxf )(


Note:

Same idea as “idealized distributions” above

Recall discussion from:

Page 8, of Class Notes, Jan. 23

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155-07-01-23.ppt


e.g. Uniform Distribution

Idea: choose random number from [0,1]

Use constant density: f(x) = C

Models “equally likely”

To choose C, want: Area

1 = P{X in [0,1]} = C

So want C = 1. 0 1

Uniform Random Variable

HW:

4.54 (0.73, 0, 0.73, 0.2, 0.5)

4.56 (1, ½, 1/8)


e.g. Normal Distribution

Idea: Draw at random from a normal

population

f(x) is the normal curve (studied above)

Review some earlier concepts:

Normal Curve Mathematics

The “normal density curve” is:

usual “function” of

circle constant = 3.14…

natural number =

2.7…

,2

21

21

)(

x

exf

x

Normal Curve Mathematics

Main Ideas:

• Basic shape is:

• “Shifted to mu”:

• “Scaled by sigma”:

• Make Total Area = 1: divide by

• as , but never

2

21x

e

2

0

221 x

e2

21

x

e

0)( xf x

Computation of Normal Areas

EXCEL

Computation:

works in terms of

“lower areas”

E.g. for

Area < 1.3

)5.0,1(N

Computation of Normal Probs

EXCEL Computation:

probs given by “lower

areas”

E.g. for X ~ N(1,0.5)

P{X < 1.3} = 0.73

Normal Random Variables

As above, compute probabilities as areas,

In EXCEL, use NORMDIST & NORMINV

E.g. above: X ~ N(1,0.5)

P{X < 1.3} =NORMDIST(1.3,1,0.5,TRUE)

= 0.73 (as in pic above)

Normal Random Variables

HW:

4.57, 4.58 (0.965, ~0)