2-1. 2-2 For exams (MD1, MD2, and Final): You may bring one 8.5” by 11” sheet of paper with...

2-2

•For exams (MD1, MD2, and Final):

You may bring one 8.5” by 11” sheet of paper with formulas and notes written or typed on both sides to each exam.

2-3

Types of Data

Quantitative data are measurements that are recorded on a naturally occurring numerical scale.

Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories.

2-4

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

DotPlot

2-5

Example

1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40

2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95

stem Leaf unit=0.01

1.0 69

1.1 9

1.2 67

1.4 0

1.5 1

1.7 2

1.9 5

2.0 35

2.1 336

2.2 4

2.3 1

2.4 1

2.5 07

2.6 4

stem Leaf unit=0.1

1 001

1 22

1 45

1 7

1 9

2 00111

2 23

2 455

2 6

2-6

Example

1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40

2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95

2-7

Example

1.72 2.5 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40

2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95

2-8

Two Characteristics

The central tendency of the set of measurements–that is, the tendency of the data to cluster, or center, about certain numerical values.

Central Tendency (Location)

Center

2-9

Two Characteristics

The variability of the set of measurements–that is, the spread of the data, spread around the mean.

Variation (Dispersion)

Sample A

Sample B

Variation of Sample B

Variation of Sample A

2-10

Mean

1. Most common measure of central tendency

2. Acts as ‘balance point’

3. Affected by extreme values (‘outliers’)

4. Denoted where

x

x

n

x x x

n

ii

n

n

1 1 2 …

x

Sample mean

2-11

Median

1. Measure of central tendency

2. Middle value in ordered sequence• If n is odd, middle value of sequence• If n is even, average of 2 middle values

3. Position of median in sequence

4. Not affected by extreme values

Positioning Point n 1

2

2-12

Median Example Even-Sized Sample

• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7

• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7

• Position: 1 2 3 4 5 6

Positioning Point

Median

n 12

6 12

3 5

7 7 8 92

8 30

.

. ..

2-13

Mode Example

• No ModeRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

• One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9

• More Than 1 ModeRaw Data: 21 28 28 41 43 43

2-14

Shape

• Describes how data are distributed

• A data set is said to be skewed if one tail of the distribution has more extreme observations than the other tail.

Right-SkewedLeft-Skewed Symmetric

MeanMean = = MedianMedian MeanMean MedianMedian MedianMedian MeanMeanModeMode

ModeMode= = ModeMode

2-15

Example

• Mean=45

• Median=68

• Mode=94

• Is this data-set skewed? If it is, which direction is the skewness?

2-16

Example

• Mean=45

• Median=68

• Mode=94

• Skewed to the left.

2-17

Sample Variance Formula

n – 1 in denominator!

2

2 1

2 2 2

1 2

1

1

n

ii

n

x xs

n

x x x x x x

n

2-18

A shortcut formula for variance

2-19

Sample Standard Deviation Formula

2

2

1

2 2 2

1 2

1

1

n

ii

n

s s

x x

n

x x x x x x

n

2-20

Thinking Challenge 1

• Why do we need to take square root of variance to have a meaningful measure?

• Otherwise we would have a squared unit.

2-21

Interpreting Standard Deviation: Chebyshev’s Theorem

sx 3 sx 3sx 2 sx 2sx xsx

No useful information

At least 3/4 of the data

At least 8/9 of the data

2-22

Interpreting Standard Deviation: Empirical Rule

– 3 – 2 – – – 2 – 3

Approximately 68% of the measurements

Approximately 95% of the measurements

Approximately 99.7% of the measurements

2-23

Empirical Rule Example

• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)

• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)

• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),

(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)

2-24

Numerical Measures of Relative Standing: Percentiles• Describes the relative location of a

measurement compared to the rest of the data

• Descriptive measures of the relationship of a measurement to the rest of the data are called measures of relative standing.

• The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it

• Median = 50th percentile

2-25

Percentile Example

• You scored 560 on the GMAT exam. This score puts you in the 58th percentile.

• What percentage of test takers scored lower than you did?

• What percentage of test takers scored higher than you did?

2-26

Percentile Example

• What percentage of test takers scored lower than you did?

58% of test takers scored lower than 560.

• What percentage of test takers scored higher than you did?

(100 – 58)% = 42% of test takers scored higher than 560.

2-27

Quartiles• Percentiles that partition a data set into four

categories, each category contains exactly 25 percent of the measurements, are called quartiles.

2-28

Example1.06 1.09 1.19 1.26 1.27 1.4 1.51 1.72 1.95 2.03

2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.5 2.57 2.64

Position for median=21/2=10.5

Median=(2.03+2.05)/2=2.04

Q1=median of the first half with position=5.5→Q1=(1.27+1.4)/21.3

Q3=median of the second half with position=5.5→Q3=(2.24+2.31)/22.3

2-29

Numerical Measures of Relative Standing: z–Scores

• Describes the relative location of a measurement (x) compared to the rest of the data

• Measures the number of standard deviations away from the mean a data value is located

• Sample z–score Population z–score

z x x

sz

x µ

2-30

• The value of z-score reflects the relative standing of the measurement.

• A large positive z-score implies that the measurement is larger than almost all other measurements.

• A large value in negative magnitude indicates that the measurements is smaller than almost all other measurements.

• z score near 0 or is 0 means the measurement is located at or near the mean of the sample or population.

2-31

Interpretation of z–Scores

2-32

Box Plot

•Q1

•Q3•Q2

•The most extreme observation smaller than upper inner fence(Q3+IQR*1.5=3.4)=2.64

•The most extreme observation bigger than upper inner fence(Q1-IQR*1.5=-0.2)=1.1

1.06 1.09 1.19 1.26 1.27 1.4 1.51 1.72 1.95 2.03

2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.5 2.57 2.64

2-33

Box Plot

3. A second pair of fences, the outer fences, are defined at a distance of 3(IQR) from the hinges. One symbol (*) represents measurements falling between the inner and outer fences, and another (0) represents measurements beyond the outer fences.

4. Symbols that represent the median and extreme data points vary depending on software used. You may use your own symbols if you are constructing a box plot by hand.

2-34

Outlier

An observation (or measurement) that is unusually large or small relative to the other values in a data set is called an outlier. Outliers typically are attributable to one of the following causes:

1. The measurement is observed, recorded, or entered into the computer incorrectly.

2. The measurement comes from a different population.

3. The measurement is correct but represents a rare (chance) event.

2-35

Key Ideas

Rules for Detecting Quantitative Outliers

Method Suspect Highly Suspect

Values between inner

and outer fences

2 < |z| < 3

Box plot:

z-score

Values beyond outer

fences

|z| > 3

2-36

Experiments & Sample Spaces

1. Experiment• Process of observation that leads to a single

outcome that cannot be predicted with certainty

2. Sample point• Most basic outcome of an

experiment

3. Sample space (S) • Collection of all sample points

Sample Space Depends on Experimenter!

2-37

Visualizing Sample Space

1. Listing for the experiment of tossing a coin once and noting up face

S = {Head, Tail}

Sample point

2. A pictorial method for presenting the sample space Venn Diagram

HT

S

2-38

Example

• Experiment: Tossing two coins and recording up faces:

• Is sample space as below?

S={HH, HT, TT}

2-39

Tree Diagram

1st coin

H T

H T H T

2nd coin

2-40

Sample Space Examples

• Toss a Coin, Note Face {Head, Tail}• Toss 2 Coins, Note Faces {HH, HT, TH, TT}• Select 1 Card, Note Kind {2♥, 2♠, ..., A♦} (52)• Select 1 Card, Note Color {Red, Black} • Play a Football Game {Win, Lose, Tie}• Inspect a Part, Note Quality {Defective, Good}• Observe Gender {Male, Female}

Experiment Sample Space

2-41

Events

1. Specific collection of sample points

2. Simple Event

• Contains only one sample point

3. Compound Event

• Contains two or more sample points

2-42

What is Probability?

1. Numerical measure of the likelihood that event will occur

• P(Event)• P(A)• Prob(A)

2. Lies between 0 & 1

3. Sum of probabilities for all sample points in the sample space is 1

•11

•.5 .5 •00

•CertainCertain

•ImpossibleImpossible

2-43

Equally Likely Probability

P(Event) = X / T• X = Number of outcomes in the

event

• T = Total number of sample points in Sample Space

• Each of T sample points is equally likely

— P(sample point) = 1/T

© 1984-1994 T/Maker Co.

2-44

Thinking Challenge (sol.)• Consider rolling two fair dice.

• Let event A=Having the sum of upfaces 6 or less. So,

• A={ (1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (3,1), (3,2),(3,3), (4,1), (4,2), (5,1)}

each with prob.1/36

• P(A)=15/36=5/12

2-45

Combinations RuleA sample of n elements is to be drawn from a set of N elements. The, the number of different samples possible

is denoted byN

n

and is equal to

N

n

N!

n! N n !where the factorial symbol (!) means that

n!=n*(n-1)*…*3*2*1

5!54 321For example, 0! is defined to be 1.

2-46

Thinking Challenge

• The price of a european tour includes four stopovers to be selected from among 10 cities. In how many different ways can one plan such a tour if the order of the stopovers does not matter?

2-47

Unions & Intersections

1. Union• Outcomes in either events A or B or both• ‘OR’ statement• Denoted by symbol (i.e., A B)

2. Intersection• Outcomes in both events A and B• ‘AND’ statement• Denoted by symbol (i.e., A B)

2-48

• The table displays the probabilities for each of the six outcomes when rolling a particular unfair die. Suppose that the die is rolled once. Let A be the event that the number rolled is less than 4, and let B be the event that the number rolled is odd. Find P(AB).

Outcome1 2 3 4 5 6

Probability 0.1 0.1 0.1 0.2 0.2 0.3

A. 0.5 B. 0.2 C. 0.3 D. 0.7

2-49

EventEvent B1 B2 Total

A1 P(A 1 B1) P(A1 B2) P(A1)

A2 P(A 2 B1) P(A2 B2) P(A2)

P(B1) P(B2) 1

Event Probability Using Two–Way Table

Joint Probability Marginal (Simple) Probability

Total

2-50

Thinking Challenge

1. P(A) =

2. P(D) =

3. P(C B) =

4. P(A D) =

5. P(B D) =

EventEvent C D Total

A 4 2 6

B 1 3 4

Total 5 5 10

What’s the Probability?

2-51

Solution*

The Probabilities Are:

1. P(A) = 6/10

2. P(D) = 5/10

3. P(C B) = 1/10

4. P(A D) = 9/10

5. P(B D) = 3/10


A 4 2 6

B 1 3 4

Total 5 5 10

2-52

Complementary Events

Complement of Event A• The event that A does not occur• All events not in A• Denote complement of A by AC

S

AC

A

2-53

Rule of Complements

The sum of the probabilities of complementary events equals 1:

P(A) + P(AC) = 1

S

AC

A

2-54

3.4

The Additive Rule and Mutually Exclusive Events

2-55

S

Mutually Exclusive Events Example

Events and are Mutually Exclusive

Experiment: Draw 1 Card. Note Kind & Suit.

Outcomes in Event Heart:

2, 3, 4,

..., A

Sample Space:

2, 2,

2, ..., A

Event Spade:

2, 3, 4, ..., A

2-56

Additive Rule

1. Used to get compound probabilities for union of events

2. P(A OR B) = P(A B) = P(A) + P(B) – P(A B)

3. For mutually exclusive events:P(A OR B) = P(A B) = P(A) + P(B)

2-57

• Let P(A)=0.25 and P(BC)=0.4. If P(A ∪B)=0.85. Are the two events A, B mutually exclusive events?

A. True B. False

Thinking Challenge

2-58

Thinking Challenge

1. P(A D) =

2. P(B C) =


A 4 2 6

B 1 3 4

Total 5 5 10

Using the additive rule, what is the probability?

2-59

10 10 10 10 6 5 2 9

Solution*

Using the additive rule, the probabilities are:

P(A D) = P(A) + P(D) – P(A D)1.

2. P(B C) = P(B) + P(C) – P(B C)

10 10 10 10

4 5 1 8

= + – =

= + – =

2-60

Conditional Probability

1. Event probability given that another event occurred

2. Revise original sample space to account for new information

• Eliminates certain outcomes

3. P(A | B) = P(A and B) = P(A B P(B) P(B)

2-61

Using the table then the formula, what’s the probability?

Thinking Challenge

1. P(A|D) =

2. P(C|B) =


A 4 2 6

B 1 3 4

Total 5 5 10

2-62

Solution*

Using the formula, the probabilities are:

P A D P A B P D

25

510

2

5

P C B P C B P B

110

410

1

4

P(D)=P(AD)+P(BD)=2/10+3/10

P(B)=P(BD)+P(BC)=3/10+1/10

2-63

Multiplicative Rule

1. Used to get compound probabilities for intersection of events

2. P(A and B) = P(A B)= P(A) P(B|A) = P(B) P(A|B)

3. The key words both and and in the statement imply and intersection of two events, which in turn we should multiply probabilities to obtain the probability of interest.

2-64

• Suppose that 23% of adults smoke cigarettes. Given a selected adult is a smoker, the probability that he/she has a lung condition before the age of 60 is 57%. What is the probability that a randomly selected person is a smoker and has a lung condition before the age of 60.

A. 0.57 B. 0.99 C. 0.77 D. 0.13

Thinking Challenge

2-65

Multiplicative Rule Example

Experiment: Draw 1 Card. Note Kind & Color. Color

Type Red Black Total

Ace 2 2 4

Non-Ace 24 24 48

Total 26 26 52

4 2 2

52 4 52

P(Ace Black) = P(Ace)∙P(Black | Ace)

2-66

Statistical Independence

1. Event occurrence does not affect probability of another event

• Toss 1 coin twice

2. Causality not implied

3. Tests for independence• P(A | B) = P(A)• P(B | A) = P(B)

• P(A B) = P(A) P(B)

2-67

• Consider a regular deck of 52 cards with two black suits i.e. ♠(13),♣(13) and two red suits i.e. ♥(13) ♦(13). Given that you have a red card what is the probability that it is a queen? Also are the events getting a red card and getting a queen independent?

A. 1/13, No (i.e. P(Queen)≠ P(Queen|Red))

B. 1/13, Yes (i.e. P(Queen)=P(Queen|Red)

C. 2/13, Yes (i.e. P(Queen)≠ P(Queen|Red)

D. 2/13, No (i.e. P(Queen)=P(Queen|Red)

Thinking Challenge

2-68

Bayes’s Rule

Given k mutually exclusive and exhaustive events B1, B1, . . . Bk , such thatP(B1) + P(B2) + … + P(Bk) = 1,and an observed event A, then

P(Bi| A)

P(Bi A)

P( A)

P(B

i)P( A | B

i)

P(B1)P( A | B

1) P(B

2)P( A | B

2) ... P(B

k)P( A | B

k)

•Bayes’s rule is useful for finding one conditional probability when other conditional probabilities are already known.

2-69

Bayes’s Rule Example

A company manufactures MP3 players at two factories. Factory I produces 60% of the MP3 players and Factory II produces 40%. Two percent of the MP3 players produced at Factory I are defective, while 1% of Factory II’s are defective. An MP3 player is selected at random and found to be defective. What is the probability it came from Factory I?

2-70

Bayes’s Rule Example

Factory Factory IIII

Factory Factory II0 .6

0.02

0.98

0 .4 0.01

0.99

DefectiveDefective

DefectiveDefective

GoodGood

GoodGood

P(I | D) P(I )P(D | I )

P(I )P(D | I ) P(II )P(D | II )

0.60.02

0.60.02 0.40.010.75

2-71

Random Variable

A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical value is assigned to each sample point.

2-72

Random Variable (cont.)

• There are two types of random variables:– Discrete random variables can take one of a

finite number of distinct outcomes.• Example: Number of credit hours

– Continuous random variables can take any numeric value within a range of values.

• Example: Cost of books this term

2-73

Discrete Probability Distribution

The probability distribution of a discrete random variable is a graph, table, or formula that specifies the probability associated with each possible value the random variable can assume.

2-74

Requirements for the Probability Distribution of a Discrete Random Variable x

1. p(x) ≥ 0 for all values of x

2. p(x) = 1

where the summation of p(x) is over all possible values of x.

2-75

a) It is not valid

b) It is valid

c) It is not valid

d) It is not valid

2-76

a) {HHH,HTT,THT,TTH,THH,HTH,HHT,TTT}

{0,1,2,3}

b) {1/8,3/8,3/8,1/8}

d) P(x=2 or x=3)= P(x=2)+P(x=3)=3/8+1/8=1/2

2-77

1. Expected Value (Mean of probability distribution)• Weighted average of all possible values

• = E(x) = x p(x)

2. Variance• Weighted average of squared deviation about mean

• 2 = E[(x 2(x 2p(x)=x2p(x)-2

Summary Measures

3. Standard Deviation2

2-78

Thinking challenge

• For the probability model given below, what is the value of P and E(X)?

X 2 3 4 5

P(x) 0.2 0.3 0.1 P

A.0.1, 3.0

B.0.2, 4.7

C. 0.3, 1.7

D. 0.4, 3.7

2-79

Binomial ProbabilityCharacteristics of a Binomial Experiment

1.The experiment consists of n identical trials.

2.There are only two possible outcomes on each trial. We will denote one outcome by S (for success) and the other by F (for failure).

3.The probability of S remains the same from trial to trial. This probability is denoted by p, and the probability of F is denoted by q. Note that q = 1 – p.

4.The trials are independent.

5.The binomial random variable x is the number of S’s in n trials.

2-80

Binomial Probability Distribution

!( ) (1 )

! ( )!x n x x n xn n

p x p q p px x n x

p(x) = Probability of x ‘Successes’

p = Probability of a ‘Success’ on a single trial

q = 1 – p

n = Number of trials

x = Number of ‘Successes’ in n trials (x = 0, 1, 2, ..., n)

n – x = Number of failures in n trials

2-81

Binomial Probability Distribution Example

3 5 3

!( ) (1 )

!( )!

5!(3) .5 (1 .5)

3!(5 3)!

.3125

x n xnp x p p

x n x

p

Experiment: Toss 1 coin 5 times in a row. Note number of tails. What’s the probability of 3 tails?

•© 1984-1994 T/Maker Co.

2-82

Binomial Distribution Characteristics

.0

.5

1.0

0 1 2 3 4 5

X

P(X)

.0

.2

.4

.6

0 1 2 3 4 5

X

P(X)

n = 5 p = 0.1

n = 5 p = 0.5

E ( x ) np•Mean

•Standard Deviation

npq

2-83

Binomial Distribution Thinking Challenge

You’re a telemarketer selling service contracts for Macy’s. You’ve sold 20 in your last 100 calls (p = .20). If you call 12 people tonight, what’s the probability of

A. No sales?

B. Exactly 2 sales?

C. At most 2 sales?

D. At least 2 sales?

2-84

Binomial Distribution Solution*

n = 12, p = .20

E(X)=n*p=12*0.2=2.4

=(np(1-p))1/2=(12*0.2*0.8)1/2 =1.38

A. p(0) = .0687

B. p(2) = .2835

C. p(at most 2) = p(0) + p(1) + p(2)= .0687 + .2062 + .2835= .5584

D. p(at least 2) = p(2) + p(3)...+ p(12)= 1 – [p(0) + p(1)] = 1 – .0687 – .2062= .7251

2-85

By using TI-84:

• B.P(X = 2) = p(2) = P(X ≤ 2) – P(X ≤ 1)

• binomcdf(10,.20,2) - binomcdf(10,.20,2)

2-1. 2-2 For exams (MD1, MD2, and Final): You may bring one 8.5” by 11” sheet of paper with...

Documents

Transcript of 2-1. 2-2 For exams (MD1, MD2, and Final): You may bring one 8.5” by 11” sheet of paper with...