Stat 155, Section 2, Last Time• Big Rules of Probability:
– Not Rule ( 1 – P{opposite})– Or Rule (glasses – football)– And rule (multiply conditional prob’s)– Use in combination for real power
• Bayes Rule– Turn around conditional probabilities– Write hard ones in terms of easy ones– Recall surprising disease testing result
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 266-271, 311-323, 277-286
Approximate Reading for Next Class:
Pages 291-305, 334-351
Midterm I
Coming up: Tuesday, Feb. 27
Material: HW Assignments 1 – 6
Extra Office Hours:
Mon. Feb. 26, 8:30 – 12:00, 2:00 – 3:30
(Instead of Review Session)
Bring Along:
1 8.5” x 11” sheet of paper with formulas
Recall Pepsi Challenge
In class taste test:
• Removed bias with randomization
• Double blind approach
• Asked which was:
– Better
– Sweeter
– which
Recall Pepsi Challenge
Results summarized in spreadsheet
Eyeball impressions:
a. Perhaps no consensus preference
between Pepsi and Coke?
– Is 54% "significantly different from 50%? (will
develop methods to understand this)
– Result of "marketing research"???
Recall Pepsi Challenge
b. Perhaps no consensus as to which is
sweeter?
• Very different from the past, when Pepsi was
noticeably sweeter
• This may have driven old Pepsi challenge
phenomenon
• Coke figured this out, and matched Pepsi in
sweetness
Recall Pepsi Challenge
c. Most people believe they know
– Serious cola drinkers, because now flavor driven
– In past, was sweetness driven, and there were many
advertising caused misperceptions!
d. People tend to get it right or not??? (less clear)
– Overall 71% right. Seems like it, but again is that
significantly different from 50%?
Recall Pepsi Challenge
e. Those who think they know tend to be right???
– People who thought they knew: right 71% of the
time
f. Those who don't think they know seem to right as
well. Wonder why?
– People who didn't: also right 70% of time? Why?
"Natural sampling variation"???
– Any difference between people who thought they
knew, and those who did not think so?
Recall Pepsi Challenge
g. Coin toss was fair (or is 57% heads significantly
different from %50?)
How accurate are those ideas?
• Will build tools to assess this
• Called “hypo tests” and “P-values”
• Revisit this example later
Independence
(Need one more major concept at this level)
An event A does not depend on B, when:
Knowledge of B does not change
chances of A:
P{A | B} = P{A}
Independence
E.g. I Toss a Coin, and somebody on South
Pole does too.
P{H(me) | T(SP)} = P{H(me)} = ½.
(no way that can matter, i.e. independent)
Independence
E.g. I Toss a Coin twice:
(toss number indicated with
subscript)
• Is it < ½?
• What if have 5 Heads in a row?
(isn’t it more likely to get a Tail?)
(Wanna bet?!?)
???| 12 HHP
Independence
E.g. I Toss a Coin twice, …
Rational approach:
Look at Sample Space
Model all as equally likely
Then:
So independence is good model for coin tosses
21
21
21
21
TT
HT
TH
HH
2
1
1212 2
1
2/1
4/1&| HP
HP
HHPHHP
New Ball & Urn Example
H R R R R G G T R R G
Again toss coin, and draw ball:
Same, so R & H are independent events
Not true above, but works here, since proportions of
R & G are same
3
2| HRP
0|| TPTRPHPHRPRP
3
2
2
1
3
2
2
1
3
2
Independence
Note, when A is independent of B:
so
And thus
i.e. B is independent of A
BAPBPAP &}{
BPBAP
BAPAP&
|
ABPAPBAP
BP |&
}{
Independence
Note, when A in independent of B:
It follows that: B is independent of A
I.e. “independence” is symmetric in A and B
(as expected)
More formal treatments use symmetric version
as definition
(to avoid hassles with 0 probabilities)
Independence
HW:
4.31
Special Case of “And” Rule
For A and B independent:
P{A & B} = P{A | B} P{B} = P{B | A} P{A} =
= P{A} P{B}
i.e. When independent, just multiply probabilities…
Textbook: Call this another rule
Me: Only learn one, this is a special case
Independent “And” Rule
E.g. Toss a coin until the 1st Head appears,
find P{3 tosses}:
Model: tosses are independent
(saw this was reasonable last time, using
equally likely sample space ideas)
P{3 tosses} =
When have 3: group with parentheses
321 && HTTP
Independent “And” Rule
E.g. Toss a coin until the 1st Head appears,
find P{3 tosses}
(by indep:)
I.e. “just multiply”
321 && HTTP 321 && HTTP
21213 &&| TTPTTHP
1123 | TPTTPHP
123 TPTPHP
Independent “And” Rule
E.g. Toss a coin until the 1st Head appears,
P{3 tosses}
• Multiplication idea holds in general
• So from now on will just say:
“Since Independent, multiply probabilities”
• Similarly for Exclusive Or rule,
Will just “add probabilities”
123 TPTPHP
Independent “And” Rule
HW:
4.29 (hint: Calculate
P{G1&G2&G3&G4&G5&G6&G7})
4.33
Overview of Special Cases
Careful: these can be tricky to keep separate
OR works like adding,
for mutually exclusive
AND works like multiplying,
for independent
Overview of Special Cases
Caution: special cases are different
Mutually exclusive independent
For A and B mutually exclusive:
P{A | B} = 0 P{A}
Thus not independent
Overview of Special Cases
HW: C15 Suppose events A, B, C all have
probability 0.4, A & B are independent,
and A & C are mutually exclusive.
(a) Find P{A or B} (0.64)
(b) Find P{A or C} (0.8)
(c) Find P{A and B} (0.16)
(d) Find P{A and C} (0)
Random Variables
Text, Section 4.3 (we are currently jumping)
Idea: take probability to next level
Needed for probability structure of political
polls, etc.
Random Variables
Definition:
A random variable, usually denoted as X,
is a quantity that
“takes on values at random”
Random Variables
Two main types
(that require different mathematical models)
• Discrete, i.e. counting
(so look only at “counting numbers”, 1,2,3,…)
• Continuous, i.e. measuring
(harder math, since need all fractions, etc.)
Random Variables
E.g: X = # for Candidate A in a randomly
selected political poll: discrete
(recall all that means)
Power of the random variable idea:
• Gives something to “get a hold of…”
• Similar in spirit to high school algebra…
High School Algebra
Recall Main Idea?
Rules for solving equations???
No, major breakthrough is:
• Give unknown(s) a name
• Find equation(s) with unknown
• Solve equation(s) to find unknown(s)
Random Variables
E.g: X = # that comes up, in die rolling:
Discrete
• But not very interesting
• Since can study by simple methods
• As done above
• Don’t really need random variable concept
Random Variables
E.g: Measurement error:
Let X = measurement:
Continuous
• How to model probabilities???
Random Variables
HW on discrete vs. continuous:
4.40 ((b) discrete, (c) continuous, (d)
could be either, but discrete is more
common)
And now for something completely different
My idea about “visualization” last time:
• 30% really liked it
• 70% less enthusiastic…
• Depends on mode of thinking– “Visual thinkers” loved it
– But didn’t connect with others
• So hadn’t planned to continue that…
And now for something completely different
But here was another viewpoint:
Professor Marron,
Could you focus on something more intelligent in your "And now for something completely different" section once every two weeks, perhaps, instead of completely abolishing it? I really enjoyed your discussion of how to view three dimensions in 2-D today.
And now for something completely different
A fun example:
• Faces as data
• Each data point is a digital image
• Data from U. Carlos, III in Madrid
(hard to do here for confidentiality reasons)
Q: What distinguishes men from women?
And now for something completely different
And now for something completely different
Context: statistical problem of “classification”, i.e. “discrimination”
Basically “automatic disease diagnosis”:
• Have measurm’ts on sick & healthy cases
• Given new person, make measm’ts
• Closest to sick or healthy populations?
And now for something completely different
Approach: Distance Weight Discrimination
(Marron & Todd)
Idea: find “best separating direction” in high dimensional data space
Here:
• Data are images
• Classes: Male & Females
• Given new image: classify make - female
And now for something completely different
Fun visualization:
• March through point clouds
• Along separating direction
• Captures “Femaleness” & “Maleness”
• Note relation to “training data”
And now for something completely different
Random VariablesA die rolling example
(where random variable concept is useful)
Win $9 if 5 or 6, Pay $4, if 1, 2 or 3, otherwise (4) break even
Notes:
• Don’t care about number that comes up
• Random Variable abstraction allows focusing on important points
• Are you keen to play? (will calculate…)
Random Variables
Die rolling example
Win $9 if 5 or 6, Pay $4, if 1, 2 or 4
Let X = “net winnings”
Note: X takes on values 9, -4 and 0
Probability Structure of X is summarized by:
P{X = 9} = 1/3 P{X = -4} = 1/2 P{X = 0} = 1/6
(should you want to play?, study later)
Random Variables
Die rolling example, for X = “net winnings”:
Win $9 if 5 or 6, Pay $4, if 1, 2 or 4
Probability Structure of X is summarized by:
P{X = 9} = 1/3 P{X = -4} = 1/2 P{X = 0} = 1/6
Convenient form: a table
Winning 9 -4 0
Prob. 1/3 1/2 1/6
Summary of Prob. Structure
In general: for discrete X, summarize “distribution” (i.e. full prob. Structure) by a table:
Where:
i. All are between 0 and 1
ii. (so get a prob. funct’n as above)
Values x1 x2 … xk
Prob. p1 p2 … pk
11
k
iip
ip
Summary of Prob. Structure
Summarize distribution, for discrete X,
by a table:
Power of this idea:
• Get probs by summing table values
• Special case of disjoint OR rule
Values x1 x2 … xk
Prob. p1 p2 … pk
Summary of Prob. Structure
E.g. Die Rolling game above:
P{X = 9} = 1/3
P{X < 2} = P{X = 0} + P{X = -4} =1/6+1/2 = 2/3
P{X = 5} = 0 (not in table!)
Winning 9 -4 0
Prob. 1/3 1/2 1/6
Summary of Prob. Structure
E.g. Die Rolling game above:Winning 9 -4 0
Prob. 1/3 1/2 1/6
0
0&90|9
XPXXP
XXP
3
2
2131
31
6131
09
XPXP
Summary of Prob. Structure
HW:
4.41 & (c) Find P{X = 3 | X >= 2} (3/7)
4.52 (0.144, …, 0.352)
Probability Histogram
Idea: Visualize probability distribution using a
bar graph
E.g. Die Rolling game above:Winning 9 -4 0
Prob. 1/3 1/2 1/6Toy example probability histogram
0
0.10.2
0.3
0.40.5
0.6
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
X values
pro
bab
ilit
y
Probability Histogram
Construction in Excel:
• Very similar to bar graphs (done before)
• Bar heights = probabilities
• Example: Class Example 18
Probability Histogram
HW:
4.43
Random Variables
Now consider continuous random variables
Recall: for measurements (not counting)
Model for continuous random variables:
Calculate probabilities as areas,
under “probability density curve”, f(x)
Continuous Random Variables
Model probabilities for continuous random
variables, as areas under “probability
density curve”, f(x):
= Area( )
a b
(calculus notation)
bXaP
b
a
dxxf )(
Continuous Random Variables
Note:
Same idea as “idealized distributions” above
Recall discussion from:
Page 8, of Class Notes, Jan. 23
Continuous Random Variables
e.g. Uniform Distribution
Idea: choose random number from [0,1]
Use constant density: f(x) = C
Models “equally likely”
To choose C, want: Area
1 = P{X in [0,1]} = C
So want C = 1. 0 1
Uniform Random Variable
HW:
4.54 (0.73, 0, 0.73, 0.2, 0.5)
4.56 (1, ½, 1/8)
Continuous Random Variables
e.g. Normal Distribution
Idea: Draw at random from a normal
population
f(x) is the normal curve (studied above)
Review some earlier concepts:
Normal Curve Mathematics
The “normal density curve” is:
usual “function” of
circle constant = 3.14…
natural number =
2.7…
,2
21
21
)(
x
exf
x
Normal Curve Mathematics
Main Ideas:
• Basic shape is:
• “Shifted to mu”:
• “Scaled by sigma”:
• Make Total Area = 1: divide by
• as , but never
2
21x
e
2
0
221 x
e2
21
x
e
0)( xf x
Computation of Normal Areas
EXCEL
Computation:
works in terms of
“lower areas”
E.g. for
Area < 1.3
)5.0,1(N
Computation of Normal Probs
EXCEL Computation:
probs given by “lower
areas”
E.g. for X ~ N(1,0.5)
P{X < 1.3} = 0.73
Normal Random Variables
As above, compute probabilities as areas,
In EXCEL, use NORMDIST & NORMINV
E.g. above: X ~ N(1,0.5)
P{X < 1.3} =NORMDIST(1.3,1,0.5,TRUE)
= 0.73 (as in pic above)
Normal Random Variables
HW:
4.57, 4.58 (0.965, ~0)
Top Related