2-1. 2-2 For exams (MD1, MD2, and Final): You may bring one 8.5” by 11” sheet of paper with...
-
Upload
claud-foster -
Category
Documents
-
view
212 -
download
0
Transcript of 2-1. 2-2 For exams (MD1, MD2, and Final): You may bring one 8.5” by 11” sheet of paper with...
2-1
2-2
•For exams (MD1, MD2, and Final):
You may bring one 8.5” by 11” sheet of paper with formulas and notes written or typed on both sides to each exam.
2-3
Types of Data
Quantitative data are measurements that are recorded on a naturally occurring numerical scale.
Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories.
2-4
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
DotPlot
2-5
Example
1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40
2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95
stem Leaf unit=0.01
1.0 69
1.1 9
1.2 67
1.4 0
1.5 1
1.7 2
1.9 5
2.0 35
2.1 336
2.2 4
2.3 1
2.4 1
2.5 07
2.6 4
stem Leaf unit=0.1
1 001
1 22
1 45
1 7
1 9
2 00111
2 23
2 455
2 6
2-6
Example
1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40
2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95
2-7
Example
1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40
2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95
2-8
Two Characteristics
The central tendency of the set of measurements–that is, the tendency of the data to cluster, or center, about certain numerical values.
Central Tendency (Location)
Center
2-9
Two Characteristics
The variability of the set of measurements–that is, the spread of the data, spread around the mean.
Variation (Dispersion)
Sample A
Sample B
Variation of Sample B
Variation of Sample A
2-10
Mean
1. Most common measure of central tendency
2. Acts as ‘balance point’
3. Affected by extreme values (‘outliers’)
4. Denoted where
x
x
n
x x x
n
ii
n
n
1 1 2 …
x
Sample mean
2-11
Median
1. Measure of central tendency
2. Middle value in ordered sequence• If n is odd, middle value of sequence• If n is even, average of 2 middle values
3. Position of median in sequence
4. Not affected by extreme values
Positioning Point n 1
2
2-12
Median Example Even-Sized Sample
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
Positioning Point
Median
n 12
6 12
3 5
7 7 8 92
8 30
.
. ..
2-13
Mode Example
• No ModeRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9
• More Than 1 ModeRaw Data: 21 28 28 41 43 43
2-14
Shape
• Describes how data are distributed
• A data set is said to be skewed if one tail of the distribution has more extreme observations than the other tail.
Right-SkewedLeft-Skewed Symmetric
MeanMean = = MedianMedian MeanMean MedianMedian MedianMedian MeanMeanModeMode
ModeMode= = ModeMode
2-15
Example
• Mean=45
• Median=68
• Mode=94
• Is this data-set skewed? If it is, which direction is the skewness?
2-16
Example
• Mean=45
• Median=68
• Mode=94
• Skewed to the left.
2-17
Sample Variance Formula
n – 1 in denominator!
2
2 1
2 2 2
1 2
1
1
n
ii
n
x xs
n
x x x x x x
n
2-18
A shortcut formula for variance
2-19
Sample Standard Deviation Formula
2
2
1
2 2 2
1 2
1
1
n
ii
n
s s
x x
n
x x x x x x
n
2-20
Thinking Challenge 1
• Why do we need to take square root of variance to have a meaningful measure?
• Otherwise we would have a squared unit.
2-21
Interpreting Standard Deviation: Chebyshev’s Theorem
sx 3 sx 3sx 2 sx 2sx xsx
No useful information
At least 3/4 of the data
At least 8/9 of the data
2-22
Interpreting Standard Deviation: Empirical Rule
– 3 – 2 – – – 2 – 3
Approximately 68% of the measurements
Approximately 95% of the measurements
Approximately 99.7% of the measurements
2-23
Empirical Rule Example
• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)
• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)
• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),
(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)
2-24
Numerical Measures of Relative Standing: Percentiles• Describes the relative location of a
measurement compared to the rest of the data
• Descriptive measures of the relationship of a measurement to the rest of the data are called measures of relative standing.
• The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it
• Median = 50th percentile
2-25
Percentile Example
• You scored 560 on the GMAT exam. This score puts you in the 58th percentile.
• What percentage of test takers scored lower than you did?
• What percentage of test takers scored higher than you did?
2-26
Percentile Example
• What percentage of test takers scored lower than you did?
58% of test takers scored lower than 560.
• What percentage of test takers scored higher than you did?
(100 – 58)% = 42% of test takers scored higher than 560.
2-27
Quartiles• Percentiles that partition a data set into four
categories, each category contains exactly 25 percent of the measurements, are called quartiles.
2-28
Example1.06 1.09 1.19 1.26 1.27 1.4 1.51 1.72 1.95 2.03
2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.5 2.57 2.64
Position for median=21/2=10.5
Median=(2.03+2.05)/2=2.04
Q1=median of the first half with position=5.5→Q1=(1.27+1.4)/21.3
Q3=median of the second half with position=5.5→Q3=(2.24+2.31)/22.3
2-29
Numerical Measures of Relative Standing: z–Scores
• Describes the relative location of a measurement (x) compared to the rest of the data
• Measures the number of standard deviations away from the mean a data value is located
• Sample z–score Population z–score
z x x
sz
x µ
2-30
• The value of z-score reflects the relative standing of the measurement.
• A large positive z-score implies that the measurement is larger than almost all other measurements.
• A large value in negative magnitude indicates that the measurements is smaller than almost all other measurements.
• z score near 0 or is 0 means the measurement is located at or near the mean of the sample or population.
2-31
Interpretation of z–Scores
2-32
Box Plot
•Q1
•Q3•Q2
•The most extreme observation smaller than upper inner fence(Q3+IQR*1.5=3.4)=2.64
•The most extreme observation bigger than upper inner fence(Q1-IQR*1.5=-0.2)=1.1
1.06 1.09 1.19 1.26 1.27 1.4 1.51 1.72 1.95 2.03
2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.5 2.57 2.64
2-33
Box Plot
3. A second pair of fences, the outer fences, are defined at a distance of 3(IQR) from the hinges. One symbol (*) represents measurements falling between the inner and outer fences, and another (0) represents measurements beyond the outer fences.
4. Symbols that represent the median and extreme data points vary depending on software used. You may use your own symbols if you are constructing a box plot by hand.
2-34
Outlier
An observation (or measurement) that is unusually large or small relative to the other values in a data set is called an outlier. Outliers typically are attributable to one of the following causes:
1. The measurement is observed, recorded, or entered into the computer incorrectly.
2. The measurement comes from a different population.
3. The measurement is correct but represents a rare (chance) event.
2-35
Key Ideas
Rules for Detecting Quantitative Outliers
Method Suspect Highly Suspect
Values between inner
and outer fences
2 < |z| < 3
Box plot:
z-score
Values beyond outer
fences
|z| > 3
2-36
Experiments & Sample Spaces
1. Experiment• Process of observation that leads to a single
outcome that cannot be predicted with certainty
2. Sample point• Most basic outcome of an
experiment
3. Sample space (S) • Collection of all sample points
Sample Space Depends on Experimenter!
2-37
Visualizing Sample Space
1. Listing for the experiment of tossing a coin once and noting up face
S = {Head, Tail}
Sample point
2. A pictorial method for presenting the sample space Venn Diagram
HT
S
2-38
Example
• Experiment: Tossing two coins and recording up faces:
• Is sample space as below?
S={HH, HT, TT}
2-39
Tree Diagram
1st coin
H T
H T H T
2nd coin
2-40
Sample Space Examples
• Toss a Coin, Note Face {Head, Tail}• Toss 2 Coins, Note Faces {HH, HT, TH, TT}• Select 1 Card, Note Kind {2♥, 2♠, ..., A♦} (52)• Select 1 Card, Note Color {Red, Black} • Play a Football Game {Win, Lose, Tie}• Inspect a Part, Note Quality {Defective, Good}• Observe Gender {Male, Female}
Experiment Sample Space
2-41
Events
1. Specific collection of sample points
2. Simple Event
• Contains only one sample point
3. Compound Event
• Contains two or more sample points
2-42
What is Probability?
1. Numerical measure of the likelihood that event will occur
• P(Event)• P(A)• Prob(A)
2. Lies between 0 & 1
3. Sum of probabilities for all sample points in the sample space is 1
•11
•.5 .5 •00
•CertainCertain
•ImpossibleImpossible
2-43
Equally Likely Probability
P(Event) = X / T• X = Number of outcomes in the
event
• T = Total number of sample points in Sample Space
• Each of T sample points is equally likely
— P(sample point) = 1/T
© 1984-1994 T/Maker Co.
2-44
Thinking Challenge (sol.)• Consider rolling two fair dice.
• Let event A=Having the sum of upfaces 6 or less. So,
• A={ (1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (3,1), (3,2),(3,3), (4,1), (4,2), (5,1)}
each with prob.1/36
• P(A)=15/36=5/12
2-45
Combinations RuleA sample of n elements is to be drawn from a set of N elements. The, the number of different samples possible
is denoted byN
n
and is equal to
N
n
N!
n! N n !where the factorial symbol (!) means that
n!=n*(n-1)*…*3*2*1
5!54 321For example, 0! is defined to be 1.
2-46
Thinking Challenge
• The price of a european tour includes four stopovers to be selected from among 10 cities. In how many different ways can one plan such a tour if the order of the stopovers does not matter?
2-47
Unions & Intersections
1. Union• Outcomes in either events A or B or both• ‘OR’ statement• Denoted by symbol (i.e., A B)
2. Intersection• Outcomes in both events A and B• ‘AND’ statement• Denoted by symbol (i.e., A B)
2-48
• The table displays the probabilities for each of the six outcomes when rolling a particular unfair die. Suppose that the die is rolled once. Let A be the event that the number rolled is less than 4, and let B be the event that the number rolled is odd. Find P(AB).
Outcome1 2 3 4 5 6
Probability 0.1 0.1 0.1 0.2 0.2 0.3
A. 0.5 B. 0.2 C. 0.3 D. 0.7
2-49
EventEvent B1 B2 Total
A1 P(A 1 B1) P(A1 B2) P(A1)
A2 P(A 2 B1) P(A2 B2) P(A2)
P(B1) P(B2) 1
Event Probability Using Two–Way Table
Joint Probability Marginal (Simple) Probability
Total
2-50
Thinking Challenge
1. P(A) =
2. P(D) =
3. P(C B) =
4. P(A D) =
5. P(B D) =
EventEvent C D Total
A 4 2 6
B 1 3 4
Total 5 5 10
What’s the Probability?
2-51
Solution*
The Probabilities Are:
1. P(A) = 6/10
2. P(D) = 5/10
3. P(C B) = 1/10
4. P(A D) = 9/10
5. P(B D) = 3/10
EventEvent C D Total
A 4 2 6
B 1 3 4
Total 5 5 10
2-52
Complementary Events
Complement of Event A• The event that A does not occur• All events not in A• Denote complement of A by AC
S
AC
A
2-53
Rule of Complements
The sum of the probabilities of complementary events equals 1:
P(A) + P(AC) = 1
S
AC
A
2-54
3.4
The Additive Rule and Mutually Exclusive Events
2-55
S
Mutually Exclusive Events Example
Events and are Mutually Exclusive
Experiment: Draw 1 Card. Note Kind & Suit.
Outcomes in Event Heart:
2, 3, 4,
..., A
Sample Space:
2, 2,
2, ..., A
Event Spade:
2, 3, 4, ..., A
2-56
Additive Rule
1. Used to get compound probabilities for union of events
2. P(A OR B) = P(A B) = P(A) + P(B) – P(A B)
3. For mutually exclusive events:P(A OR B) = P(A B) = P(A) + P(B)
2-57
• Let P(A)=0.25 and P(BC)=0.4. If P(A ∪B)=0.85. Are the two events A, B mutually exclusive events?
A. True B. False
Thinking Challenge
2-58
Thinking Challenge
1. P(A D) =
2. P(B C) =
EventEvent C D Total
A 4 2 6
B 1 3 4
Total 5 5 10
Using the additive rule, what is the probability?
2-59
10 10 10 10 6 5 2 9
Solution*
Using the additive rule, the probabilities are:
P(A D) = P(A) + P(D) – P(A D)1.
2. P(B C) = P(B) + P(C) – P(B C)
10 10 10 10
4 5 1 8
= + – =
= + – =
2-60
Conditional Probability
1. Event probability given that another event occurred
2. Revise original sample space to account for new information
• Eliminates certain outcomes
3. P(A | B) = P(A and B) = P(A B P(B) P(B)
2-61
Using the table then the formula, what’s the probability?
Thinking Challenge
1. P(A|D) =
2. P(C|B) =
EventEvent C D Total
A 4 2 6
B 1 3 4
Total 5 5 10
2-62
Solution*
Using the formula, the probabilities are:
P A D P A B P D
25
510
2
5
P C B P C B P B
110
410
1
4
P(D)=P(AD)+P(BD)=2/10+3/10
P(B)=P(BD)+P(BC)=3/10+1/10
2-63
Multiplicative Rule
1. Used to get compound probabilities for intersection of events
2. P(A and B) = P(A B)= P(A) P(B|A) = P(B) P(A|B)
3. The key words both and and in the statement imply and intersection of two events, which in turn we should multiply probabilities to obtain the probability of interest.
2-64
• Suppose that 23% of adults smoke cigarettes. Given a selected adult is a smoker, the probability that he/she has a lung condition before the age of 60 is 57%. What is the probability that a randomly selected person is a smoker and has a lung condition before the age of 60.
A. 0.57 B. 0.99 C. 0.77 D. 0.13
Thinking Challenge
2-65
Multiplicative Rule Example
Experiment: Draw 1 Card. Note Kind & Color. Color
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
4 2 2
52 4 52
P(Ace Black) = P(Ace)∙P(Black | Ace)
2-66
Statistical Independence
1. Event occurrence does not affect probability of another event
• Toss 1 coin twice
2. Causality not implied
3. Tests for independence• P(A | B) = P(A)• P(B | A) = P(B)
• P(A B) = P(A) P(B)
2-67
• Consider a regular deck of 52 cards with two black suits i.e. ♠(13),♣(13) and two red suits i.e. ♥(13) ♦(13). Given that you have a red card what is the probability that it is a queen? Also are the events getting a red card and getting a queen independent?
A. 1/13, No (i.e. P(Queen)≠ P(Queen|Red))
B. 1/13, Yes (i.e. P(Queen)=P(Queen|Red)
C. 2/13, Yes (i.e. P(Queen)≠ P(Queen|Red)
D. 2/13, No (i.e. P(Queen)=P(Queen|Red)
Thinking Challenge
2-68
Bayes’s Rule
Given k mutually exclusive and exhaustive events B1, B1, . . . Bk , such thatP(B1) + P(B2) + … + P(Bk) = 1,and an observed event A, then
P(Bi| A)
P(Bi A)
P( A)
P(B
i)P( A | B
i)
P(B1)P( A | B
1) P(B
2)P( A | B
2) ... P(B
k)P( A | B
k)
•Bayes’s rule is useful for finding one conditional probability when other conditional probabilities are already known.
2-69
Bayes’s Rule Example
A company manufactures MP3 players at two factories. Factory I produces 60% of the MP3 players and Factory II produces 40%. Two percent of the MP3 players produced at Factory I are defective, while 1% of Factory II’s are defective. An MP3 player is selected at random and found to be defective. What is the probability it came from Factory I?
2-70
Bayes’s Rule Example
Factory Factory IIII
Factory Factory II0 .6
0.02
0.98
0 .4 0.01
0.99
DefectiveDefective
DefectiveDefective
GoodGood
GoodGood
P(I | D) P(I )P(D | I )
P(I )P(D | I ) P(II )P(D | II )
0.60.02
0.60.02 0.40.010.75
2-71
Random Variable
A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical value is assigned to each sample point.
2-72
Random Variable (cont.)
• There are two types of random variables:– Discrete random variables can take one of a
finite number of distinct outcomes.• Example: Number of credit hours
– Continuous random variables can take any numeric value within a range of values.
• Example: Cost of books this term
2-73
Discrete Probability Distribution
The probability distribution of a discrete random variable is a graph, table, or formula that specifies the probability associated with each possible value the random variable can assume.
2-74
Requirements for the Probability Distribution of a Discrete Random Variable x
1. p(x) ≥ 0 for all values of x
2. p(x) = 1
where the summation of p(x) is over all possible values of x.
2-75
a) It is not valid
b) It is valid
c) It is not valid
d) It is not valid
2-76
a) {HHH,HTT,THT,TTH,THH,HTH,HHT,TTT}
{0,1,2,3}
b) {1/8,3/8,3/8,1/8}
d) P(x=2 or x=3)= P(x=2)+P(x=3)=3/8+1/8=1/2
2-77
1. Expected Value (Mean of probability distribution)• Weighted average of all possible values
• = E(x) = x p(x)
2. Variance• Weighted average of squared deviation about mean
• 2 = E[(x 2(x 2p(x)=x2p(x)-2
Summary Measures
3. Standard Deviation2
2-78
Thinking challenge
• For the probability model given below, what is the value of P and E(X)?
X 2 3 4 5
P(x) 0.2 0.3 0.1 P
A.0.1, 3.0
B.0.2, 4.7
C. 0.3, 1.7
D. 0.4, 3.7
2-79
Binomial ProbabilityCharacteristics of a Binomial Experiment
1.The experiment consists of n identical trials.
2.There are only two possible outcomes on each trial. We will denote one outcome by S (for success) and the other by F (for failure).
3.The probability of S remains the same from trial to trial. This probability is denoted by p, and the probability of F is denoted by q. Note that q = 1 – p.
4.The trials are independent.
5.The binomial random variable x is the number of S’s in n trials.
2-80
Binomial Probability Distribution
!( ) (1 )
! ( )!x n x x n xn n
p x p q p px x n x
p(x) = Probability of x ‘Successes’
p = Probability of a ‘Success’ on a single trial
q = 1 – p
n = Number of trials
x = Number of ‘Successes’ in n trials (x = 0, 1, 2, ..., n)
n – x = Number of failures in n trials
2-81
Binomial Probability Distribution Example
3 5 3
!( ) (1 )
!( )!
5!(3) .5 (1 .5)
3!(5 3)!
.3125
x n xnp x p p
x n x
p
Experiment: Toss 1 coin 5 times in a row. Note number of tails. What’s the probability of 3 tails?
•© 1984-1994 T/Maker Co.
2-82
Binomial Distribution Characteristics
.0
.5
1.0
0 1 2 3 4 5
X
P(X)
.0
.2
.4
.6
0 1 2 3 4 5
X
P(X)
n = 5 p = 0.1
n = 5 p = 0.5
E ( x ) np•Mean
•Standard Deviation
npq
2-83
Binomial Distribution Thinking Challenge
You’re a telemarketer selling service contracts for Macy’s. You’ve sold 20 in your last 100 calls (p = .20). If you call 12 people tonight, what’s the probability of
A. No sales?
B. Exactly 2 sales?
C. At most 2 sales?
D. At least 2 sales?
2-84
Binomial Distribution Solution*
n = 12, p = .20
E(X)=n*p=12*0.2=2.4
=(np(1-p))1/2=(12*0.2*0.8)1/2 =1.38
A. p(0) = .0687
B. p(2) = .2835
C. p(at most 2) = p(0) + p(1) + p(2)= .0687 + .2062 + .2835= .5584
D. p(at least 2) = p(2) + p(3)...+ p(12)= 1 – [p(0) + p(1)] = 1 – .0687 – .2062= .7251
2-85
By using TI-84:
• B.P(X = 2) = p(2) = P(X ≤ 2) – P(X ≤ 1)
• binomcdf(10,.20,2) - binomcdf(10,.20,2)