GM – 03 QUANTATIVE TECHNIQUES FOR MANAGERS. 11-2 Making Decisions Data, Information, Knowledge 1....
-
Upload
wesley-day -
Category
Documents
-
view
219 -
download
1
Transcript of GM – 03 QUANTATIVE TECHNIQUES FOR MANAGERS. 11-2 Making Decisions Data, Information, Knowledge 1....
GM – 03 QUANTATIVE TECHNIQUES FOR
MANAGERS
11-2
Making Decisions
Data, Information, KnowledgeData, Information, Knowledge1. Data: specific observations of measured
numbers.2. Information: processed and summarized
data yielding facts and ideas.3. Knowledge: selected and organized
information that provides understanding, recommendations, and the basis for decisions.
WHAT DOES STATISTICS ACHIEVE
Making Decisions
Descriptive and Inferential Statistics
Descriptive StatisticsDescriptive Statistics include graphical and numerical procedures that
summarize and process data and are used to transform data into
information.
BRANCHES OF STATISTICS
Making Decisions
Descriptive and Inferential Statistics
Inferential Statistics Inferential Statistics provide the bases for predictions, forecasts, and
estimates that are used to transform information to
knowledge.
The Journey to Making Decisions
Begin Here:
Identify the Problem
Data
Information
Knowledge
Decision
Descriptive Statistics,Probability, Computers
Experience, Theory,Literature, InferentialStatistics, Computers
Describing Describing DataData
©
11-7Summarizing and Describing Data
Tables and GraphsTables and Graphs Numerical MeasuresNumerical Measures
Frequency Distributions
A frequency distributionfrequency distribution is a table used to organize data. The left column (called classes or groups) includes numerical
intervals on a variable being studied. The right column is a list of the frequencies, or number of observations, for each class.
Intervals are normally of equal size, must cover the range of the sample observations,
and be non-overlapping.
11-9
Example of a Frequency Distribution
A Frequency Distribution for the Shampoo Example
Weights (in mL) Number of Bottles220 less than 225 1225 less than 230 4230 less than 235 29235 less than 240 34240 less than 245 26245 less than 250 6
Cumulative Frequency Distributions
A cumulative frequency distributioncumulative frequency distribution contains the number of observations
whose values are less than the upper limit of each interval. It is constructed by
adding the frequencies of all frequency distribution intervals up to and including
the present interval.
Relative Cumulative Frequency Distributions
A relative cumulative frequency relative cumulative frequency distribution distribution converts all cumulative
frequencies to cumulative percentages
11-12
Example of a Frequency Distribution
A Cumulative Frequency Distribution for the Shampoo Example
Weights (in mL) Number of Bottlesless than 225 1less than 230 5less than 235 34less than 240 68less than 245 94less than 250 100
Parameters and Statistics
A statisticstatistic is a descriptive measure computed from a sample of data. A parameterparameter is a descriptive measure
computed from an entire population of data.
Measures of Central Tendency- Arithmetic Mean -
A arithmetic mean arithmetic mean is of a set of data is the sum of the data values
divided by the number of observations.
Sample Mean
If the data set is from a sample, then the sample mean, , is:X
n
xxx
n
xX n
n
ii
211
Population Mean
If the data set is from a population, then the population mean, , is:
N
xxx
N
xn
N
ii
211
Measures of Central Tendency- Median -
An ordered array ordered array is an arrangement of data in either ascending or descending order. Once
the data are arranged in ascending order, the medianmedian is the value such that 50% of the observations are smaller and 50% of the
observations are larger.
If the sample size n is an odd number, the median, Xm, is the middle observation. If the sample size n is an even number, the medianmedian, Xm, is the average of the two middle observations. The medianmedian will be located in the 0.50(n+1)th ordered position0.50(n+1)th ordered position.
Measures of Central Tendency- Mode -
The mode, mode, if one exists, is the most frequently occurring
observation in the sample or population.
Shape of the Distribution
The shape of the distribution is said to be symmetricsymmetric if the observations are balanced, or evenly distributed,
about the mean. In a symmetric distribution the mean and median
are equal.
Shape of the Distribution
A distribution is skewedskewed if the observations are not symmetrically distributed above
and below the mean. A positively skewedpositively skewed (or skewed to the right) distribution has a
tail that extends to the right in the direction of positive values. A negatively negatively
skewedskewed (or skewed to the left) distribution has a tail that extends to the left in the
direction of negative values.
Shapes of the Distribution
Symmetric Distribution
0123456789
10
1 2 3 4 5 6 7 8 9
Fre
qu
ency
Positively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
ency
Negatively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
ency
Measures of Variability- The Range -
The range range in a set of data is the difference between the
largest and smallest observations
MEASURES OF DISPERSION
Measures of Variability- Sample Variance -
The sample variance, ssample variance, s22, , is the sum of the squared differences between each
observation and the sample mean divided by the sample size minus 1.
1
)(1
2
2
n
Xxs
n
ii
MOST IMPORTANT MEASURE OF DISPERSION
Measures of Variability- Short-cut Formulas for Sample Variance -
Short-cut formulas for the sample variance sample variance are:
11
)(22
21
2
2
n
Xnxsor
nn
xx
s i
n
i
ii
Measures of Variability- Population Variance -
The population variance, population variance, 22, , is the sum of the squared differences between each observation
and the population mean divided by the population size, N.
N
xN
ii
1
2
2
)(
DISTINGUISH BETWEEN POPULATION VARIANCEAND SAMPLE VARIANCE (IMPORTANT)
Measures of Variability- Sample Standard Deviation -
The sample standard deviation, s, sample standard deviation, s, is the positive square root of the variance, and is
defined as:
1
)(1
2
2
n
Xxss
n
ii
Measures of Variability- Population Standard Deviation-
The population standard deviation, population standard deviation, , , is
N
xN
ii
1
2
2
)(
11-28
For a set of data with a bell-shaped histogram, the Empirical RuleEmpirical Rule is:
• approximately 68%68% of the observations are contained with a distance of one standard deviation around the mean; 1
• approximately 95%95% of the observations are contained with a distance of two standard deviations around the mean; 2
• almost all of the observations are contained with a distance of three standard deviation around the mean; 3
Coefficient of Variation
The Coefficient of Variation, CV, Coefficient of Variation, CV, is a measure of relative dispersion that expresses the standard
deviation as a percentage of the mean (provided the mean is positive).
The sample coefficient of variationsample coefficient of variation is
The population coefficient of variationpopulation coefficient of variation is
0100 XifX
sCV
0100
ifCV
Five-Number Summary
The Five-Number Summary Five-Number Summary refers to the refers to the five descriptive measures: minimum, first five descriptive measures: minimum, first
quartile, median, third quartile, and the quartile, median, third quartile, and the maximum.maximum.
imumimum XQMedianQX max31min
Grouped Data Mean
For a population of N observations the mean is
For a sample of n observations, the mean is
N
mfK
iii
1
n
mfX
K
iii
1
Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively
Grouped Data Variance
For a population of N observations the variance is
For a sample of n observations, the variance is
21
2
1
2
2
)(
N
mf
N
mfK
ii
K
iii i
Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively
11
)(1
22
1
2
2
n
Xnmf
n
Xmfs
K
ii
K
iii i
11-33
Pie Charts Categories represented as percentages of total
Bar Graphs Heights of rectangles represent group
frequencies Frequency Polygons
Height of line represents frequency Ogives
Height of line represents cumulative frequency Time Plots
Represents values over time
1-8 Methods of Displaying Data
11-34
Pie Chart
33.0%
23.0%
19.0%
19.0%
6.0%
Category
Happy with career
Don't like my job but it is on my career pathJob is OK, but it is not on my career path
Enjoy job, but it is not on my career pathMy job just pays the bills
Figure 1-10: Twentysomethings split on job satisfication
My job just pays the bills
Happy with career
Enjoy job, but it is not on my career path
Job OK, but it is not on my career path
Do not like my job, but it is on my career path
33.0%
23.0%
19.0%
19.0%
6.0%
Category
Happy with career
Don't like my job but it is on my career pathJob is OK, but it is not on my career path
Enjoy job, but it is not on my career pathMy job just pays the bills
Figure 1-10: Twentysomethings split on job satisfication
My job just pays the bills
Happy with career
Enjoy job, but it is not on my career path
Job OK, but it is not on my career path
Do not like my job, but it is on my career path
11-35
Bar Chart
C41Q4Q3Q2Q1Q
1.5
1.2
0.9
0.6
0.3
0.0
Figure 1-11: SHIFTING GEARS
2003 2004
Quartely net income for General Motors (in billions)
C41Q4Q3Q2Q1Q
1.5
1.2
0.9
0.6
0.3
0.0
Figure 1-11: SHIFTING GEARS
2003 2004
Quartely net income for General Motors (in billions)
11-36
Relative Frequency Polygon Ogive
Frequency Polygon and Ogive
50403020100
0.3
0.2
0.1
0.0
Re
lativ
e F
req
ue
ncy
Sales50403020100
1.0
0.5
0.0
Cu
mu
lativ
e R
ela
tive
Fre
qu
en
cy
Sales
(Cumulative frequency or relative frequency graph)
11-37
OSAJJMAMFJDNOSAJJMAMFJDNOSAJJMAMFJ
8.5
7.5
6.5
5.5
Month
Mill
ions
of T
ons
M o nthly S te e l P ro d uc tio n
Time Plot
11-38
1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02
1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02
Example 1-8: Stem-and-Leaf Display
Figure 1-17: Task Performance Times
11-39
X X *o
MedianQ1 Q3InnerFence
InnerFence
OuterFence
OuterFence
Interquartile Range
Smallest data point not below inner fence
Largest data point not exceeding inner fence
Suspected outlierOutlier
Q1-3(IQR)Q1-1.5(IQR) Q3+1.5(IQR)
Q3+3(IQR)
Elements of a Box PlotElements of a Box Plot
Box Plot
ProbabilityProbability
11-41
Using StatisticsBasic Definitions: Events, Sample Space,
and ProbabilitiesBasic Rules for ProbabilityConditional Probability Independence of EventsCombinatorial ConceptsThe Law of Total Probability and Bayes’
Theorem Joint Probability TableUsing the Computer
ProbabilityProbability22
11-42
2-1 Probability is:
A quantitative measure of uncertainty
A measure of the strength of belief in the occurrence of an uncertain event
A measure of the degree of chance or likelihood of occurrence of an uncertain event
Measured by a number between 0 and 1 (or between 0% and 100%)
11-43
Types of Probability
Objective or Classical Probability based on equally-likely events based on long-run relative frequency
of events not based on personal beliefs is the same for all observers
(objective) examples: toss a coin, throw a die,
pick a card
11-44
Types of Probability (Continued)
Subjective Probability based on personal beliefs,
experiences, prejudices, intuition - personal judgment
different for all observers (subjective) examples: Super Bowl, elections, new
product introduction, snowfall
11-45
Set - a collection of elements or objects of interest Empty set (denoted by )
a set containing no elements Universal set (denoted by S)
a set containing all possible elements Complement (Not). The complement
of A is a set containing all elements of S not in A
A
2-2 Basic Definitions
11-46
Complement of a Set
A
A
S
Venn Diagram illustrating the Complement of an eventVenn Diagram illustrating the Complement of an event
11-47
Intersection (And)– a set containing all elements in both A and B
Union (Or)– a set containing all elements in A or B or both
A B A B
A B A B
Basic Definitions (Continued)
11-48
A BA B
Sets: A Intersecting with B
AB
S
11-49
Sets: A Union B
A BA B
AB
S
11-50
• Mutually exclusive or disjoint sets
–sets having no elements in common, having no intersection, whose intersection is the empty set
• Partition
–a collection of mutually exclusive sets which together include all possible elements, whose union is the universal set
Basic Definitions (Continued)
11-51Mutually Exclusive or Disjoint Sets
A B
S
Sets have nothing in common
11-52
• Process that leads to one of several possible outcomes *, e.g.:
Coin toss• Heads, Tails
Throw die• 1, 2, 3, 4, 5, 6
Pick a card AH, KH, QH, ...
Introduce a new product• Each trial of an experiment has a single
observed outcome.• The precise outcome of a random
experiment is unknown before a trial.
* Also called a basic outcome, elementary event, or simple event* Also called a basic outcome, elementary event, or simple event
Experiment
11-53
Sample Space or Event Set Set of all possible outcomes (universal set) for a
given experiment E.g.: Roll a regular six-sided die
S = {1,2,3,4,5,6}
Event Collection of outcomes having a common
characteristic E.g.: Even number
A = {2,4,6}
Event A occurs if an outcome in the set A occurs Probability of an event
Sum of the probabilities of the outcomes of which it consists P(A) = P(2) + P(4) + P(6)
Events : Definition
11-54
• For example: Throw a die
• Six possible outcomes {1,2,3,4,5,6}• If each is equally-likely, the probability of
each is 1/6 = 0.1667 = 16.67%
• Probability of each equally-likely outcome is
1 divided by the number of possible outcomes
Event A (even number)• P(A) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6
= 1/2• for e in A
P A P e
n A
n S
( ) ( )
( )
( )
3
6
1
2
P en S
( )( )
1
Equally-likely Probabilities(Hypothetical or Ideal Experiments)
11-55
Pick a Card: Sample Space
Event ‘Ace’Union of Events ‘Heart’and ‘Ace’
Event ‘Heart’
The intersection of theevents ‘Heart’ and ‘Ace’ comprises the single pointcircled twice: the ace of hearts
P Heart Ace
n Heart Ace
n S
( )
( )
( )
16
52
4
13
P Heartn Heart
n S
( )( )
( )
13
52
1
4
P Acen Ace
n S
( )( )
( )
4
52
1
13
P Heart Acen Heart Ace
n S
( )( )
( )
1
52
Hearts Diamonds Clubs Spades
A A A AK K K KQ Q Q QJ J J J
10 10 10 109 9 9 98 8 8 87 7 7 76 6 6 65 5 5 54 4 4 43 3 3 32 2 2 2
11-56
Range of Values for P(A):
Complements - Probability of not A
Intersection - Probability of both A and B
Mutually exclusive events (A and C) :
Range of Values for P(A):
Complements - Probability of not A
Intersection - Probability of both A and B
Mutually exclusive events (A and C) :
1)(0 AP
P A P A( ) ( ) 1
P A B n A Bn S
( ) ( )( )
P A C( ) 0
2-3 Basic Rules for Probability
11-57
• Union - Probability of A or B or both (rule of unions)
Mutually exclusive events: If A and B are mutually exclusive, then
• Union - Probability of A or B or both (rule of unions)
Mutually exclusive events: If A and B are mutually exclusive, then
P A B n A Bn S
P A P B P A B( ) ( )( )
( ) ( ) ( )
)()()( 0)( BPAPBAPsoBAP
Basic Rules for Probability (Continued)
11-58
Sets: P(A Union B)
)( BAP )( BAP
AB
S
11-59
• Conditional Probability - Probability of A given B
Independent events:
• Conditional Probability - Probability of A given B
Independent events:
0)( ,)(
)()( BPwhereBP
BAPBAP
P A B P A
P B A P B
( ) ( )
( ) ( )
2-4 Conditional Probability
11-60
Rules of conditional probability:Rules of conditional probability:
If events A and D are statistically independent:
so
so
P A B P A BP B
( ) ( )( )
P A B P A B P B
P B A P A
( ) ( ) ( )
( ) ( )
P AD P A
P D A P D
( ) ( )
( ) ( )
P A D P A P D( ) ( ) ( )
Conditional Probability (continued)
11-61
AT& T IBM Total
Telecommunication 40 10 50
Computers 20 30 50
Total 60 40 100
Counts
AT& T IBM Total
Telecommunication .40 .10 .50
Computers .20 .30 .50
Total .60 .40 1.00
Probabilities
2.050.0
10.0
)(
)()(
TP
TIBMPTIBMP
Probability that a project is undertaken by IBM given it is a telecommunications project:
Contingency Table - Example 2-2
11-62
P A B P A
P B A P B
and
P A B P A P B
( ) ( )
( ) ( )
( ) ( ) ( )
Conditions for the statistical independence of events A and B:
P Ace HeartP Ace Heart
P Heart
P Ace
( )( )
( )
( )
1521352
113
P Heart AceP Heart Ace
P Ace
P Heart
( )( )
( )
( )
1524
52
14
)()(52
1
52
13*
52
4)( HeartPAcePHeartAceP
2-5 Independence of Events
11-63
0976.00024.006.004.0
)()()()()
0024.006.0*04.0
)()()()
BTPBPTPBTPb
BPTPBTPa
0976.00024.006.004.0
)()()()()
0024.006.0*04.0
)()()()
BTPBPTPBTPb
BPTPBTPa
Events Television (T) and Billboard (B) are assumed to be independent.
Independence of Events – Example 2-5
11-64
The probability of the union of several independent events is 1 minus the product of probabilities of their complements:
P A A A An P A P A P A P An( ) ( ) ( ) ( ) ( )1 2 3
11 2 3
Example 2-7:
6513.03487.011090.01
)10()3()2()1(1)10321(
QPQPQPQPQQQQ
The probability of the intersection of several independent events is the product of their separate individual probabilities:
P A A A An P A P A P A P An( ) ( ) ( ) ( ) ( )1 2 3 1 2 3
Product Rules for Independent Events
11-65
Consider a pair of six-sided dice. There are six possible outcomes from throwing the first die {1,2,3,4,5,6} and six possible outcomes from throwing the second die {1,2,3,4,5,6}. Altogether, there are 6*6 = 36 possible outcomes from throwing the two dice.
In general, if there are n events and the event i can happen in Ni possible ways, then the number of ways in which the
sequence of n events may occur is N1N2...Nn.
Pick 5 cards from a deck of 52 - with replacement 52*52*52*52*52=525 380,204,032
different possible outcomes
Pick 5 cards from a deck of 52 - without replacement 52*51*50*49*48 = 311,875,200
different possible outcomes
2-6 Combinatorial Concepts
11-66
How many ways can you order the 3 letters A, B, and C?
There are 3 choices for the first letter, 2 for the second, and 1 for the last, so there are 3*2*1 = 6 possible ways to order the threeletters A, B, and C.
How many ways are there to order the 6 letters A, B, C, D, E, and F? (6*5*4*3*2*1 = 720)
Factorial: For any positive integer n, we define n factorial as:n(n-1)(n-2)...(1). We denote n factorial as n!. The number n! is the number of ways in which n objects can be ordered. By definition 1! = 1 and 0! = 1.
Factorial
11-67
Permutations are the possible ordered selections of r objects out of a total of n objects. The number of permutations of n objectstaken r at a time is denoted by nPr, where
What if we chose only 3 out of the 6 letters A, B, C, D, E, and F?There are 6 ways to choose the first letter, 5 ways to choose the second letter, and 4 ways to choose the third letter (leaving 3letters unchosen). That makes 6*5*4=120 possible orderings orpermutations.
1204*5*61*2*3
1*2*3*4*5*6
!3
!6
)!36(
!6
:
36
P
exampleFor
rnnrPn )!(!
Permutations (Order is important)
11-68
Combinations are the possible selections of r items from a group of n itemsregardless of the order of selection. The number of combinations is denotedand is read as n choose r. An alternative notation is nCr. We define the numberof combinations of r out of n elements as:
Suppose that when we pick 3 letters out of the 6 letters A, B, C, D, E, and F we chose BCD, or BDC, or CBD, or CDB, or DBC, or DCB. (These are the6 (3!) permutations or orderings of the 3 letters B, C, and D.) But these are orderings of the same combination of 3 letters. How many combinations of 6different letters, taking 3 at a time, are there?
206
120
1 * 2 * 3
4 * 5 * 6
1) * 2 * 1)(3 * 2 * (3
1 * 2 * 3 * 4 * 5 * 6
!3!3
!6
)!36(!3
!6
:
36
C
exampleFor
r
n
r)!(nr!
n!C
r
nrn
n
r
Combinations (Order is not Important)
11-69
P A P A B P A B( ) ( ) ( )
In terms of conditional probabilities:
More generally (where Bi make up a partition):
P A P A B P A BP A B P B P A B P B
( ) ( ) ( )( ) ( ) ( ) ( )
P A P A Bi
P ABi
P Bi
( ) ( )
( ) ( )
2-7 The Law of Total Probability and Bayes’ Theorem
The law of total probability:
11-70
• Bayes’ theorem enables you, knowing just a little more than the probability of A given B, to find the probability of B given A.
• Based on the definition of conditional probability and the law of total probability.
P B AP A B
P A
P A B
P A B P A B
P AB P B
P AB P B P AB P B
( )( )
( )
( )
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
Applying the law of total probability to the denominator
Applying the definition of conditional probability throughout
Bayes’ Theorem
11-71
11-72
2-8 The Joint Probability Table
A joint probability table is similar to a contingency table , except that it has probabilities in place of frequencies.
The joint probability for Example 2-11 is shown below.
The row totals and column totals are called marginal probabilities.
11-73
The Joint Probability Table
A joint probability table is similar to a contingency table , except that it has probabilities in place of frequencies.
The row totals and column totals are called marginal probabilities.
A joint probability table is similar to a contingency table , except that it has probabilities in place of frequencies.
The row totals and column totals are called marginal probabilities.
11-74The Joint Probability Table:
The joint probability table is summarized below.
High Medium Low TotalTotal Appreciates ( Re)
0.21 0.2 0.04 0.45
Depreciates
(Re)
0.09 0.3 0.16 0.55
TotalTotal 0.30 0.5 0.20 1.00Marginal probabilities are the row totals and the column totals.Marginal probabilities are the row totals and the column totals.
GROWTH
Random VariablesRandom Variables
11-76
Consider the different possible orderings of boy (B) and girl (G) in four sequential births. There are 2*2*2*2=24 = 16 possibilities, so the sample space is:
BBBB BGBB GBBB GGBB BBBG BGBG GBBG GGBGBBGB BGGB GBGB GGGBBBGG BGGG GBGG GGGG
If girl and boy are each equally likely [P(G) = P(B) = 1/2], and the gender of each child is independent of that of the previous child, then the probability of each of these 16 possibilities is:(1/2)(1/2)(1/2)(1/2) = 1/16.
3-1 Using Statistics
11-77
Random Variables (Continued)
BBBB BGBB GBBB
BBBG BBGB
GGBB GBBG BGBG
BGGB GBGB BBGG BGGG GBGG
GGGB GGBG
GGGG
0
1
2
3
4
XX
Sample Space
Points on the Real Line
11-78
Since the random variable X = 3 when any of the four outcomes BGGG, GBGG, GGBG, or GGGB occurs,
P(X = 3) = P(BGGG) + P(GBGG) + P(GGBG) + P(GGGB) = 4/16
The probability distribution of a random variable is a table that lists the possible values of the random variables and their associated probabilities.
x P(x)0 1/161 4/162 6/163 4/164 1/16 16/16=1
Random Variables (Continued)
The Graphical Display for this Probability Distributionis shown on the next Slide.
The Graphical Display for this Probability Distributionis shown on the next Slide.
11-79
Random Variables (Continued)
Number of Girls, X
Pro
bability
, P(X
)
43210
0.4
0.3
0.2
0.1
0.0
1/ 16
4/ 16
6/ 16
4/ 16
1/ 16
Probability Distribution of the Number of Girls in Four Births
Number of Girls, X
Pro
bability
, P(X
)
43210
0.4
0.3
0.2
0.1
0.0
1/ 16
4/ 16
6/ 16
4/ 16
1/ 16
Probability Distribution of the Number of Girls in Four Births
11-80
Consider the experiment of tossing two six-sided dice. There are 36 possible outcomes. Let the random variable X represent the sum of the numbers on the two dice:
2 3 4 5 6 71,1 1,2 1,3 1,4 1,5 1,6 82,1 2,2 2,3 2,4 2,5 2,6 93,1 3,2 3,3 3,4 3,5 3,6 104,1 4,2 4,3 4,4 4,5 4,6 11
5,1 5,2 5,3 5,4 5,5 5,6 126,1 6,2 6,3 6,4 6,5 6,6
x P(x)*
2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36
1
x P(x)*
2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36
1
12111098765432
0.17
0.12
0.07
0.02
xp
(x)
Probability Distribution of Sum of Two Dice
* ( ) ( ( ) ) / Note that: P x x 6 7 362
Example 3-1
11-81
Probability of at least 1 switch: P(X 1) = 1 - P(0) = 1 - 0.1 = .9Probability of at least 1 switch: P(X 1) = 1 - P(0) = 1 - 0.1 = .9
Probability Distribution of the Number of Switches
x P(x)0 0.11 0.22 0.33 0.24 0.15 0.1
1
x P(x)0 0.11 0.22 0.33 0.24 0.15 0.1
1
Probability of more than 2 switches: P(X > 2) = P(3) + P(4) + P(5) = 0.2 + 0.1 + 0.1 = 0.4Probability of more than 2 switches: P(X > 2) = P(3) + P(4) + P(5) = 0.2 + 0.1 + 0.1 = 0.4
543210
0.4
0.3
0.2
0.1
0.0
x
P(x
)
The Probability Distribution of the Number of Switches
Example 3-2
11-82
A discrete random variable: has a countable number of possible values has discrete jumps (or gaps) between successive values has measurable probability associated with individual values counts
A discrete random variable: has a countable number of possible values has discrete jumps (or gaps) between successive values has measurable probability associated with individual values counts
A continuous random variable: has an uncountably infinite number of possible values moves continuously from value to value has no measurable probability associated with each value measures (e.g.: height, weight, speed, value, duration, length)
A continuous random variable: has an uncountably infinite number of possible values moves continuously from value to value has no measurable probability associated with each value measures (e.g.: height, weight, speed, value, duration, length)
Discrete and Continuous Random Variables
11-83
1 0
1
0 1
. for all values of x.
2.
Corollary:
all x
P x
P x
P X
( )
( )
( )
The probability distribution of a discrete random variable X must satisfy the following two conditions.
Rules of Discrete Probability Distributions
11-84
F x P X x P iall i x
( ) ( ) ( )
The cumulative distribution function, F(x), of a discrete random variable X is:
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1.00
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1.00 543210
1 .0
0 .9
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0 .0
x
F(x
)
Cumulative Probability Distribution of the Number of Switches
Cumulative Distribution Function
11-85
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1
The probability that at most three switches will occur:
Cumulative Distribution Function
Note:Note: P(X < 3) = F(3) = 0.8 = P(0) + P(1) + P(2) + P(3)
11-86
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1
The probability that more than one switch will occur:
Using Cumulative Probability Distributions (Figure 3-8)
Note:Note: P(X > 1) = P(X > 2) = 1 – P(X < 1) = 1 – F(1) = 1 – 0.3 = 0.7
11-87
x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0
1
The probability that anywhere from one to three switches will occur:
Using Cumulative Probability Distributions (Figure 3-9)
Note:Note: P(1 < X < 3) = P(X < 3) – P(X < 0) = F(3) – F(0) = 0.8 – 0.1 = 0.7
11-88
The mean of a probability distribution is a measure of its centrality or location, as is the mean or average of a frequency distribution. It is a weighted average, with the values of the random variable weighted by their probabilities.
The mean is also known as the expected value (or expectation) of a random variable, because it is the value that is expected to occur, on average.
The expected value of a discrete random variable X is equal to the sum of each value of the random variable multiplied by its probability.
E X xP xall x
( ) ( )
x P(x) xP(x)0 0.1 0.01 0.2 0.22 0.3 0.63 0.2 0.64 0.1 0.45 0.1 0.5 1.0 2.3 = E(X) =
543210
2.3
3-2 Expected Values of Discrete Random Variables
11-89
Number of items, x P(x) xP(x) h(x) h(x)P(x) 5000 0.2 1000 2000 400 6000 0.3 1800 4000 1200 7000 0.2 1400 6000 1200 8000 0.2 1600 8000 1600 9000 0.1 900 10000 1000
1.0 6700 5400
Example 3-3Example 3-3: Monthly sales of a certain product are believed to follow the given probability distribution. Suppose the company has a fixed monthly production cost of $8000 and that each item brings $2. Find the expected monthly profit h(X), from product sales.
E h X h x P xall x
[ ( )] ( ) ( ) 5400
The expected value of a function of a discrete random variable X is:
E h X h x P xall x
[ ( )] ( ) ( )
The expected value of a linear function of a random variable is: E(aX+b)=aE(X)+b
In this case: E(2X-8000)=2E(X)-8000=(2)(6700)-8000=5400In this case: E(2X-8000)=2E(X)-8000=(2)(6700)-8000=5400
Expected Value of a Function of a Discrete Random Variables
Note: h (X) = 2X – 8000 where X = # of items sold
11-90
The variancevariance of a random variable is the expected squared deviation from the mean:
2 2 2
2 2 2
2
V X E X x P x
E X E X x P x xP x
all x
all x all x
( ) [( ) ] ( ) ( )
( ) [ ( )] ( ) ( )
The standard deviationstandard deviation of a random variable is the square root of its variance: SD X V X( ) ( )
Variance and Standard Deviation of a Random Variable
11-91
Number ofSwitches, x P(x) xP(x) (x-) (x-)2 P(x-)2 x2P(x)
0 0.1 0.0 -2.3 5.29 0.529 0.01 0.2 0.2 -1.3 1.69 0.338 0.22 0.3 0.6 -0.3 0.09 0.027 1.23 0.2 0.6 0.7 0.49 0.098 1.84 0.1 0.4 1.7 2.89 0.289 1.65 0.1 0.5 2.7 7.29 0.729 2.5
2.3 2.010 7.3
Number ofSwitches, x P(x) xP(x) (x-) (x-)2 P(x-)2 x2P(x)
0 0.1 0.0 -2.3 5.29 0.529 0.01 0.2 0.2 -1.3 1.69 0.338 0.22 0.3 0.6 -0.3 0.09 0.027 1.23 0.2 0.6 0.7 0.49 0.098 1.84 0.1 0.4 1.7 2.89 0.289 1.65 0.1 0.5 2.7 7.29 0.729 2.5
2.3 2.010 7.3
2 2
2 201
2 2
22
73 232 201
V X E X
xall x
P x
E X E X
xall x
P x xP xall x
( ) [( ) ]
( ) ( ) .
( ) [ ( )]
( ) ( )
. . .
Table 3-8
Variance and Standard Deviation of a Random Variable – using Example 3-2
Recall: = 2.3.
11-92
The variance of a linear function of a random variable is:
V a X b a V X a( ) ( ) 2 2 2
Number of items, x P(x) xP(x) x2 P(x) 5000 0.2 1000 5000000 6000 0.3 1800 10800000 7000 0.2 1400 9800000 8000 0.2 1600 12800000 9000 0.1 900 8100000
1.0 6700 46500000
Example 3-Example 3-3:3:
2
2 2
2
2
2
2
2 8000
46500000 6700 1610000
1610000 1268 862 8000 2
4 1610000 6440000
2 80002 2 1268 86 2537 72
V X
E X E X
x P x xP x
SD XV X V X
SD x
all x all x
x
x
( )
( ) [ ( )]
( ) ( )
( )
( ) .( ) ( ) ( )
( )( )
( )( )( . ) .
( )
Variance of a Linear Function of a Random Variable
11-93
The mean or expected value of the sum of random variables is the sum of their means or expected values:
( ) ( ) ( ) ( )X Y X YE X Y E X E Y
For example: E(X) = $350 and E(Y) = $200
E(X+Y) = $350 + $200 = $550
The variance of the sum of mutually independent random variables is the sum of their variances:
2 2 2( ) ( ) ( ) ( )X Y X YV X Y V X V Y
if and only if X and Y are independent.
For example: V(X) = 84 and V(Y) = 60 V(X+Y) = 144
Some Properties of Means and Variances of Random Variables
11-94
The variance of the sum of k mutually independent random variables is the sum of their variances:
Some Properties of Means and Variances of Random Variables
NOTE:NOTE: )(...)2()1()...21( kXEXEXEkXXXE )(...)2()1()...21( kXEXEXEkXXXE
)(...)2(2)1(1)...2211( kXEkaXEaXEakXkaXaXaE )(...)2(2)1(1)...2211( kXEkaXEaXEakXkaXaXaE
)(...)2()1()...21( kXVXVXVkXXXV
)(2...)2(22
)1(21
)...2211( kXVk
aXVaXVakXkaXaXaV
andand
11-95
Chebyshev’s Theorem applies to probability distributions just as it applies to frequency distributions.
For a random variable X with mean standard deviation , and for any number k > 1:
P X kk
( ) 11
2
11
21
14
34
75%
11
31
19
89
89%
11
41
116
1516
94%
2
2
2
At least
Lie within
Standarddeviationsof the mean
2
3
4
Chebyshev’s Theorem Applied to Probability Distributions
11-96
• If an experiment consists of a single trial and the outcome of the trial can only be either a success* or a failure, then the trial is called a Bernoulli trial.
• The number of success X in one Bernoulli trial, which can be 1 or 0, is a Bernoulli random variable.
• Note: If p is the probability of success in a Bernoulli experiment, the E(X) = p and V(X) = p(1 – p).
* The terms success and failure are simply statistical terms, and do not have positive or negative implications. In a production setting, finding a defective product may be termed a “success,” although it is not a positive result.
3-3 Bernoulli Random Variable
11-97
Consider a Bernoulli Process in which we have a sequence of n identical trials satisfying the following conditions:
1. Each trial has two possible outcomes, called success *and failure. The two outcomes are mutually exclusive and exhaustive.
2. The probability of success, denoted by p, remains constant from trial to trial. The probability of failure is denoted by q, where q = 1-p.
3. The n trials are independent. That is, the outcome of any trial does not affect the outcomes of the other trials.
A random variable, X, that counts the number of successes in n Bernoulli trials, where p is the probability of success* in any given trial, is said to follow the binomial probability distribution with parameters n (number of trials) and p (probability of success). We call X the binomial random variable.
* The terms success and failure are simply statistical terms, and do not have positive or negative implications. In a production setting, finding a defective product may be termed a “success,” although it is not a positive result.
3-4 The Binomial Random Variable
11-98
Suppose we toss a single fair and balanced coin five times in succession, and let X represent the number of heads.
There are 25 = 32 possible sequences of H and T (S and F) in the sample space for this experiment. Of these, there are 10 in which there are exactly 2 heads (X=2):
HHTTT HTHTH HTTHT HTTTH THHTT THTHT THTTH TTHHT TTHTH TTTHH
The probability of each of these 10 outcomes is p3q3 = (1/2)3(1/2)2=(1/32), so the probability of 2 heads in 5 tosses of a fair and balanced coin is:
P(X = 2) = 10 * (1/32) = (10/32) = 0.3125
10 (1/32)
Number of outcomeswith 2 heads
Probability of eachoutcome with 2 heads
Binomial Probabilities (Introduction)
11-99
10 (1/32)
Number of outcomeswith 2 heads
Probability of eachoutcome with 2 heads
P(X=2) = 10 * (1/32) = (10/32) = .3125Notice that this probability has two parts:
In general:
1. The probability of a given sequence of x successes out of n trials with probability of success p and probability of failure q is equal to:
pxq(n-x) nCxn
x
nx n x
!
!( )!
2. The number of different sequences of n trials that result in exactly x successes is equal to the number of choices of x elements out of a total of n elements. This number is denoted:
Binomial Probabilities (continued)
11-100
Number of successes, x Probability P(x)
0
1
2
3
n
1.00
nn
p q
nn
p q
nn
p q
nn
p q
nn n n
p q
n
n
n
n
n n n
!!( )!
!!( )!
!!( )!
!!( )!
!!( )!
( )
( )
( )
( )
( )
0 0
1 1
2 2
3 3
0 0
1 1
2 2
3 3
The binomial probability distribution:
where :p is the probability of success in a single trial,q = 1-p,n is the number of trials, andx is the number of successes.
P xn
xp q
nx n x
p qx n x x n x( )!
!( )!( ) ( )
The Binomial Probability Distribution
The Normal DistributionThe Normal Distribution
11-102
As n increases, the binomial distribution approaches a ...
n = 6 n = 14n = 10
Normal Probability Density Function:
6543210
0.3
0.2
0.1
0.0
x
P(x
)
Binomial Distribution: n=6, p=.5
109876543210
0.3
0.2
0.1
0.0
x
P(x
)
Binomial Distribution: n=10, p=.5
14131211109876543210
0.3
0.2
0.1
0.0
x
P(x
)
Binomial Distribution: n=14, p=.5
50-5
0.4
0.3
0.2
0.1
0.0
x
f(x)
Normal Distribution: = 0, = 1
4-1 Introduction
...14159265.3 and ...7182818.2 where
for 22
2
22
1)(
e
x
x
exf
11-103
The normal probability density function:
50-5
0.4
0.3
0.2
0.1
0.0
x
f(x)
Normal Distribution: = 0, = 1
The Normal Probability Distribution
f x e
x
x
e
( )
. ... . ...
1
2 2
2
2 2
2 7182818 314159265
for
where and
11-104
• The normal is a family of Bell-shaped and symmetric distributions.
because the distribution is symmetric, one-half (.50 or 50%) lies on either side of the mean.
Each is characterized by a different pair of mean, , and variance, . That is: [X~N()].
Each is asymptotic to the horizontal axis. The area under any normal probability
density function within k of is the same for any normal distribution, regardless of the mean and variance.
4-2 Properties of the Normal Distribution
11-105
• If several independent random variables are normally distributed then their sum will also be normally distributed.
• The mean of the sum will be the sum of all the individual means.
• The variance of the sum will be the sum of all the individual variances (by virtue of the independence).
4-2 Properties of the Normal Distribution (continued)
11-106
• If X1, X2, …, Xn are independent normal random variable, then their sum S will also be normally distributed with
• E(S) = E(X1) + E(X2) + … + E(Xn)
• V(S) = V(X1) + V(X2) + … + V(Xn)• Note: It is the variances that can be
added above and not the standard deviations.
4-2 Properties of the Normal Distribution (continued)
11-107
Example 4.1: Let X1, X2, and X3 be independent random variables that are normally distributed with means and variances as shown.
4-2 Properties of the Normal Distribution – Example 4-1
Mean Variance
X1 10 1
X2 20 2
X3 30 3
Let S = X1 + X2 + X3. Then E(S) = 10 + 20 + 30 = 60 and V(S) = 1 + 2 + 3 = 6. The standard deviation of S is = 2.45.
6
11-108
• If X1, X2, …, Xn are independent normal random variable, then the random variable Q defined as Q = a1X1 + a2X2 + … + anXn + b will also be normally distributed with
• E(Q) = a1E(X1) + a2E(X2) + … + anE(Xn) + b• V(Q) = a1
2 V(X1) + a22 V(X2) + … + an
2 V(Xn)• Note: It is the variances that can be
added above and not the standard deviations.
4-2 Properties of the Normal Distribution (continued)
11-109
Example 4.3: Let X1 , X2 , X3 and X4 be independent random variables that are normally distributed with means and variances as shown. Find the mean and variance of Q = X1 - 2X2 + 3X2 - 4X4 + 5
4-2 Properties of the Normal Distribution – Example 4-3
Mean Variance
X1 12 4
X2 -5 2
X3 8 5
X4 10 1
E(Q) = 12 – 2(-5) + 3(8) – 4(10) + 5 = 11
V(Q) = 4 + (-2)2(2) + 32(5) + (-4)2(1) = 73
SD(Q) = 544.873
11-110
Computing the Mean, Variance and Standard Deviation for the Sum of Independent Random Variables Using the Template
11-111
All of these are normal probability density functions, though each has a different mean and variance.
Z~N(0,1)
50-5
0.4
0.3
0.2
0.1
0.0
z
f(z)
Normal Distribution: =0, =1
W~N(40,1) X~N(30,25)
454035
0.4
0.3
0.2
0.1
0.0
w
f(w)
Normal Distribution: =40, =1
6050403020100
0.2
0.1
0.0
x
f(x)
Normal Distribution: =30, =5
Y~N(50,9)
65554535
0.2
0.1
0.0
y
f(y)
Normal Distribution: =50, =3
50
Consider:
P(39 W 41)P(25 X 35)P(47 Y 53)P(-1 Z 1)
The probability in each case is an area under a normal probability density function.
Normal Probability Distributions
11-112
The standard normal random variable, Z, is the normal random variable with mean = 0 and standard deviation = 1: Z~N(0,12).
543210- 1- 2- 3- 4- 5
0 . 4
0 . 3
0 . 2
0 . 1
0 . 0
Z
f(z )
Standard Normal Distribution
= 0
=1{
4-4 The Standard Normal Distribution
11-113
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .090.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.41771.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.43191.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.44411.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.45451.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.46331.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.47061.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.47672.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.48172.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.48572.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.48902.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.49162.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.49642.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.49742.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.49812.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.49863.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f( z)
Standard Normal Distribution
1.56{
Standard Normal Probabilities
Look in row labeled 1.5 and column labeled .06 to find P(0 z 1.56) = 0.4406
Finding Probabilities of the Standard Normal Distribution: P(0 < Z < 1.56)
11-114
To find P(Z<-2.47):Find table area for 2.47
P(0 < Z < 2.47) = .4932
P(Z < -2.47) = .5 - P(0 < Z < 2.47) = .5 - .4932 = 0.0068
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
Table area for 2.47P(0 < Z < 2.47) = 0.4932
Area to the left of -2.47P(Z < -2.47) = .5 - 0.4932= 0.0068
Finding Probabilities of the Standard Normal Distribution: P(Z < -2.47)
z ... .06 .07 .08. . . .. . . .. . . .2.3 ... 0.4909 0.4911 0.49132.4 ... 0.4931 0.4932 0.49342.5 ... 0.4948 0.4949 0.4951...
11-115
Finding Probabilities of the Standard Normal Distribution: P(1< Z < 2)
z .00 ... . . . . . .0.9 0.3159 ...1.0 0.3413 ...1.1 0.3643 ... . . . . . .1.9 0.4713 ...2.0 0.4772 ...2.1 0.4821 ... . . . . . .
To find P(1 Z 2):1. Find table area for 2.00
F(2) = P(Z 2.00) = .5 + .4772 =.9772
2. Find table area for 1.00
F(1) = P(Z 1.00) = .5 + .3413 = .8413
3. P(1 Z 2.00) = P(Z 2.00) - P(Z 1.00)
= .9772 - .8413 = 0.1359
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
Area between 1 and 2P(1 Z 2) = .9772 - .8413 = 0.1359
11-116
Finding Values of the Standard Normal Random Variable: P(0 < Z < z) = 0.40
To find z such that P(0 Z z) = .40:
1. Find a probability as close as possible to .40 in the table of standard normal probabilities.
2. Then determine the value of z from the corresponding row and column.
P(0 Z 1.28) .40
Also, since P(Z 0) = .50
P(Z 1.28) .90543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
Area = .40 (.3997)
Z = 1.28
Area to the left of 0 = .50P(z 0) = .50
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .090.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-117
z .04 .05 .06 .07 .08 .09. . . . . . . . . . . . . .. . . . . . .2.4 ... 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 ... 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 ... 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964. . . . . . .. . . . . . .. . . . . . .
To have .99 in the center of the distribution, there should be (1/2)(1-.99) = (1/2)(.01) = .005 in each tail of the distribution, and (1/2)(.99) = .495 in each half of the .99 interval. That is:
P(0 Z z.005) = .495
Look to the table of standard normal probabilities to find that:
z.005 z.005
P(-.2575 Z ) = .99
To have .99 in the center of the distribution, there should be (1/2)(1-.99) = (1/2)(.01) = .005 in each tail of the distribution, and (1/2)(.99) = .495 in each half of the .99 interval. That is:
P(0 Z z.005) = .495
Look to the table of standard normal probabilities to find that:
z.005 z.005
P(-.2575 Z ) = .99
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
-z.005 z.005
Area in right tail = .005Area in left tail = .005
Area in center right = .495
Area in center left = .495
2.575-2.575
Total area in center = .99
99% Interval around the Mean
11-118
The area within k of the mean is the same for all normal random variables. So an area under any normal distribution is equivalent to an area under the standard normal. In this example: P(40 X P(-1 Z sinceand
The area within k of the mean is the same for all normal random variables. So an area under any normal distribution is equivalent to an area under the standard normal. In this example: P(40 X P(-1 Z sinceand
1009080706050403020100
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
X
f(x)
Normal Distribution: =50, =10
=10{
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
1.0
{
Transformation
(2) Division by x)
The transformation of X to Z:The transformation of X to Z:
The inverse transformation of Z to X:The inverse transformation of Z to X:
4-5 The Transformation of Normal Random Variables
(1) Subtraction: (X - x)
ZX x
x
X x Z x
11-119
Example 4-9Example 4-9
X~N(160,302)
Example 4-9Example 4-9
X~N(160,302)
Example 4-10Example 4-10
X~N(127,222)
Example 4-10Example 4-10
X~N(127,222)
Using the Normal Transformation
P X
PX
P Z
P Z
( )
.. . .
100 180100 180
100 160
30
180 160
302 6667
0 4772 0 2475 0 7247
P X
PX
P Z
P Z
( )
.. . .
150150
150 127
22
1 0450 5 0 3520 0 8520
11-120
Example 4-11Example 4-11
X~N(383,122)
Example 4-11Example 4-11
X~N(383,122)
440390340
0.05
0.04
0.03
0.02
0.01
0.00
X
f( X)
Normal Distribution: = 383, = 12
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Normal Distribution
Equivalent areas
Using the Normal Transformation - Example 4-11
Template solutionTemplate solution
P X
PX
P Z
P Z
( )
. .
. . .
394 399
394 399
394 383
12
399 383
12
0 9166 1 333
0 4088 0 3203 0 0885
11-121
The transformation of X to Z:The transformation of X to Z: The inverse transformation of Z to X:The inverse transformation of Z to X:
The transformation of X to Z, where a and b are numbers::The transformation of X to Z, where a and b are numbers::
The Transformation of Normal Random Variables
P X a P Za
P X b P Zb
P a X b Pa
Zb
( )
( )
( )
ZX x
x
X
xZ
x
11-122
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
S tand ard N o rm al D is trib utio n• The probability that a normal random variable will be within 1 standard deviation from its mean (on either side) is 0.6826, or approximately 0.68.
• The probability that a normal random variable will be within 2 standard deviations from its mean is 0.9544, or approximately 0.95.
• The probability that a normal random variable will be within 3 standard deviation from its mean is 0.9974.
Normal Probabilities (Empirical Rule)
11-123
z .07 .08 .09 . . . . . . . . . . . . . . .1.1 . . . 0.3790 0.3810 0.38301.2 . . . 0.3980 0.3997 0.40151.3 . . . 0.4147 0.4162 0.4177 . . . . . . . . . . . . . . .
The area within k of the mean is the same for all normal random variables. To find a probability associated with any interval of values for any normal random variable, all that is needed is to express the interval in terms of numbers of standard deviations from the mean. That is the purpose of the standard normal transformation. If X~N(50,102),
That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean of X: 70 = + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal distribution.
The area within k of the mean is the same for all normal random variables. To find a probability associated with any interval of values for any normal random variable, all that is needed is to express the interval in terms of numbers of standard deviations from the mean. That is the purpose of the standard normal transformation. If X~N(50,102),
That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean of X: 70 = + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal distribution.
Example 4-12 X~N(124,122)P(X > x) = 0.10 and P(Z > 1.28) 0.10x = + z = 124 + (1.28)(12) = 139.36
Example 4-12 X~N(124,122)P(X > x) = 0.10 and P(Z > 1.28) 0.10x = + z = 124 + (1.28)(12) = 139.36
18013080
0.04
0.03
0.02
0.01
0.00
X
f(x)
Normal Distribution: = 124, = 12
4-6 The Inverse Transformation
0.01
139.36
P X Px
P Z P Z( ) ( )
70
70 70 50
102
11-124
4000300020001000
0.0012
0.0010
0.0008
0.0006
0.0004
0.0002
0.0000
X
f( x)
Normal Distribution: = 2450, = 400
.
.
.
.
.
.
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Norm al D istribution
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
Finding Values of a Normal Random Variable, Given a Probability
11-125
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
2. Shade the area corresponding to the desired probability.
2. Shade the area corresponding to the desired probability.
Finding Values of a Normal Random Variable, Given a Probability
4000300020001000
0.0012
0.0010
0.0008
0.0006
0.0004
0.0002
0.0000
X
f( x)
Normal Distribution: = 2450, = 400
.
.
.
.
.
.
.4750.4750
.9500
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Norm al D istribution
.4750.4750
.9500
11-126
z .05 .06 .07 . . . . . . . . . . . . . . .1.8 . . . 0.4678 0.4686 0.46931.9 . . . 0.4744 0.4750 0.47562.0 . . . 0.4798 0.4803 0.4808 . . . . . . . . . .
3. From the table of the standard normal distribution, find the z value or values.
3. From the table of the standard normal distribution, find the z value or values.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
2. Shade the area corresponding to the desired probability.
2. Shade the area corresponding to the desired probability.
Finding Values of a Normal Random Variable, Given a Probability
4000300020001000
0.0012
0.0010
0.0008
0.0006
0.0004
0.0002
0.0000
X
f( x)
Normal Distribution: = 2450, = 400
.
.
.
.
.
.
.4750.4750
.9500
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Norm al D istribution
.4750.4750
.9500
-1.96 1.96
11-127
4. Use the transformation from z to x to get value(s) of the original random variable.
4. Use the transformation from z to x to get value(s) of the original random variable.
x = z = 2450 ± (1.96)(400) = 2450 ±784=(1666,3234)
x = z = 2450 ± (1.96)(400) = 2450 ±784=(1666,3234)
Finding Values of a Normal Random Variable, Given a Probability
z .05 .06 .07 . . . . . . . . . . . . . . .1.8 . . . 0.4678 0.4686 0.46931.9 . . . 0.4744 0.4750 0.47562.0 . . . 0.4798 0.4803 0.4808 . . . . . . . . . .
3. From the table of the standard normal distribution, find the z value or values.
3. From the table of the standard normal distribution, find the z value or values.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
1. Draw pictures of the normal distribution in question and of the standard normal distribution.
2. Shade the area corresponding to the desired probability.
2. Shade the area corresponding to the desired probability.
4000300020001000
0.0012
0.0010
0.0008
0.0006
0.0004
0.0002
0.0000
X
f( x)
Normal Distribution: = 2450, = 400
.
.
.
.
.
.
.4750.4750
.9500
543210-1-2-3-4-5
0.4
0.3
0.2
0.1
0.0
Z
f(z)
Standard Norm al D istribution
.4750.4750
.9500
-1.96 1.96
11-128
1050
0.3
0.2
0.1
0.0
X
f(x)
Normal Distribution: = 3.5, = 1.323
76543210
0.3
0.2
0.1
0.0
X
P(x
)
Binomial Distribution: n = 7, p = 0.50
The normal distribution with = 3.5 and = 1.323 is a close approximation to the binomial with n = 7 and p = 0.50.
The normal distribution with = 3.5 and = 1.323 is a close approximation to the binomial with n = 7 and p = 0.50.
P(x<4.5) = 0.7749
MTB > cdf 4.5;SUBC> normal 3.5 1.323.Cumulative Distribution Function
Normal with mean = 3.50000 and standard deviation = 1.32300
x P( X <= x) 4.5000 0.7751
MTB > cdf 4.5;SUBC> normal 3.5 1.323.Cumulative Distribution Function
Normal with mean = 3.50000 and standard deviation = 1.32300
x P( X <= x) 4.5000 0.7751
MTB > cdf 4;SUBC> binomial 7,.5.Cumulative Distribution Function
Binomial with n = 7 and p = 0.500000
x P( X <= x) 4.00 0.7734
MTB > cdf 4;SUBC> binomial 7,.5.Cumulative Distribution Function
Binomial with n = 7 and p = 0.500000
x P( X <= x) 4.00 0.7734
P( x 4) = 0.7734
Finding Values of a Normal Random Variable, Given a Probability
11-129
FOR ANY RESEARCH WE ARE ALWAYS INTERESTEDTO UNDERSTAND THE POPULATION PARAMETERSO THAT DECISIONS CAN BE MADE BASED ONINFORMATION.
EX: A MARKETER MAY BE INTERESTED TO KNOW AVERAGE CONSUMPTION OF SUGAR PER HOUSEHOLD PER MONTH IN THE CITY OF DELHI. THIS INFORMATION IS THE POPULATION PARAMETERWHERE THE WHOLE OF CITY DELHI HOUSEHOLD ISTHE POPULATION AND THE AVERAGE CONSUMPTION
OF SUGAR IS THE PARAMETER REPRESENTED BY ‘µ’
11-130
HOWEVER, FINDING THIS PARAMETER IS DIFFICULTAS IT WILL BE VIRTUALLY IMPRACTICAL TO CONTACTALL THE HOUSEHOLD OF DELHI ( OR TIME TAKENWOULD BE VERY LARGE) AND THE PURPOSE OF THE STUDY ITSELF MAY BE TIME BARED.
HENCE WE MUST RESORT TO COLLECTING THE INFORMATION FROM ONLY A SUBSET OF THE POPULATION WHICH IS CALLED THE SAMPLE. THIS SAMPLE INFORMATION FOR THE SAME VARIABLE IS REFERRED TO AS THE STATISTIC ( x with a bar on the top )
11-131
HOWEVER SAMPLE MEAN IS NOT EQUAL TOPOPULATION TO MEAN AND THE DIFFERENCE IN THE SAME IS THE ERROR IN ESTIMATING THE PARAMETER ( KNOWN AS TOTAL ERROR)
THIS ERROR OCCURS FOR SEVERAL REASONS.
11-132
Sample vs. Census
Conditions Favoring the Use of
Type of Study
Sample Census
1. Budget
Small
Large
2. Time available
Short Long
3. Population size
Large Small
4. Variance in the characteristic
Small Large
5. Cost of sampling errors
Low High
6. Cost of nonsampling errors
High Low
11-133
THUS IT IS CLEAR THAT SAMPLING IS REQUIREDAND IF SAMPLE SIZE IS PROPERLY CHOSEN THEN THE ERROR IS ALSO CAN BE KEPT AT A MINIMUMLEVEL.
11-134
SAMPLING DISTRIBUTION
IF THE TARGET SEGMENT ( POPULATION ) CONTAINS ‘N’ ELEMENTS AND FROM THIS POPULATION WE PICK RANDOMLY ‘n’ ELEMENTS.
IN HOW MANY POSSIBLE WAYS CAN WE PICK UP THESE ‘n’ ELEMENTS?
N^n ways if done with replacementNCn ways if done without replacement
FOR EACH OF THESE SAMPLES THERE WILL BE A SAMPLE MEAN. THE WAY THESE SAMPLE MEANS ARESPREAD IS KNOWN AS SAMPLING DISTRIBUTION.
11-135
SAMPLING DISTRIBUTION
Let us illustrate the concept of Sampling Distribution:
Consider a population consisting of only three members( A, B and C). If a question is asked to them as to how Many chocolates do they eat in a day, the answer is A= 1 per day, B = 2 per day and C = 3 per day. HenceThe variable is number of chocolates which is { 1, 2, 3 } . This gives the population average (µ = 2) And a variance ( σ^2) = 2/3.
If sampling of size is 2 is taken with replacement letUs list all possible samples along with its sample means
11-136
SAMPLING DISTRIBUTION
Possible sample are sample mean ( 1,1) 1 ( 1,2) 1.5 ( 1,3) 2 ( 2,1) 1.5 ( 2,2) 2 ( 2,3) 2.5 ( 3,1) 2 ( 3,2) 2.5 ( 3,3) 3
Possible freq probSampleMean
1 1 1/9 1.5 2 2/9 2 3 3/9 2.5 2 2/9 3 1 1/9
Expected value of sample mean = 2 = population meanExpected variance of sample means = σ^2/n
11-137
SAMPLING DISTRIBUTION
Sample mean1 1.5 2 2.5 3
PROBABILITY
Does this appear to be normally Distributed? Yes indeed!
11-138
SAMPLING DISTRIBUTION
Thus the Central Limit Theorem says that The distribution of the sample mean is always
Normally distributed as long as sample size is large
Such that : Expected value of sample Mean = population meanand standard deviation of sample mean = population standard deviation/n
This is true irrespective of the distribution of the Population.
11-139
• Comparing the population distribution and the sampling distribution of the mean: The sampling
distribution is more bell-shaped and symmetric.
Both have the same center.
The sampling distribution of the mean is more compact, with a smaller variance.
• Comparing the population distribution and the sampling distribution of the mean: The sampling
distribution is more bell-shaped and symmetric.
Both have the same center.
The sampling distribution of the mean is more compact, with a smaller variance.
87654321
0.2
0.1
0.0
X
P(X
)
Uniform Distribution (1,8)
X8.07.57.06.56.05.55.04.54.03.53.02.52.01.51.0
0.10
0.05
0.00
P( X
)
Sampling Distribution of the Mean
Properties of the Sampling Distribution of the Sample Mean
11-140
The expected value of the sample meanexpected value of the sample mean is equal to the population mean:
E XX X
( )
The variance of the sample meanvariance of the sample mean is equal to the population variance divided by the sample size:
V XnX
X( ) 2
2
The standard deviation of the sample mean, known as the standard error of standard deviation of the sample mean, known as the standard error of the meanthe mean, is equal to the population standard deviation divided by the square root of the sample size:
SD XnX
X( )
Relationships between Population Parameters and the Sampling Distribution of the Sample Mean
11-141
When sampling from a normal populationnormal population with mean and standard deviation , the sample mean, X, has a normal sampling distributionnormal sampling distribution:
When sampling from a normal populationnormal population with mean and standard deviation , the sample mean, X, has a normal sampling distributionnormal sampling distribution:
X Nn
~ ( , ) 2
This means that, as the sample size increases, the sampling distribution of the sample mean remains centered on the population mean, but becomes more compactly distributed around that population mean
This means that, as the sample size increases, the sampling distribution of the sample mean remains centered on the population mean, but becomes more compactly distributed around that population mean
Normal population
0.4
0.3
0.2
0.1
0.0
f (X)
Sampling Distribution of the Sample Mean
Sampling Distribution: n = 2
Sampling Distribution: n =16
Sampling Distribution: n = 4
Sampling from a Normal Population
Normal population
11-142
When sampling from a population with mean and finite standard deviation , the sampling distribution of the sample mean will tend to a normal distribution with mean and standard deviation as the sample size becomes large(n >30).
For “large enough” n:
When sampling from a population with mean and finite standard deviation , the sampling distribution of the sample mean will tend to a normal distribution with mean and standard deviation as the sample size becomes large(n >30).
For “large enough” n:
n
)/,(~ 2 nNX
P( X)
X
0.25
0.20
0.15
0.10
0.05
0.00
n = 5
P( X)
0.2
0.1
0.0 X
n = 20
f ( X)
X
-
0.4
0.3
0.2
0.1
0.0
Large n
The Central Limit Theorem
11-143
Normal Uniform Skewed
Population
n = 2
n = 30
XXXX
General
The Central Limit Theorem Applies to Sampling Distributions from Any Population
11-144
SAMPLING DISTRIBUTION - EXAMPLE
Let us assume that we are interested in understandingWhat will be the average consumption of sugar per Household per month in a given target population?
What this means is we are interested to get theInformation ( µ = average sugar consumed /month)
We can only estimate the same based on sample Information. i.e. based on sample mean . This can beDone as follows.
11-145
SAMPLING DISTRIBUTION - EXAMPLE
For the example let us assume that we sampled randomly100 household and got the information that the sampleMean was 1890 grams per household per month. Let us also assume that the population standard deviationWas known as 230 grams .
We use the fact that the sample mean obtained was oneAmong the different sample means possible and that That the sample means would be normally distributed.
Hence an interval estimate can be obtained as µ = x-bar ± Z where Z = std. normal deviate
n
11-146
SAMPLING DISTRIBUTION - EXAMPLE
For the example let us assume that we sampled randomly100 household and got the information that the sampleMean was 1890 grams per household per month. Let us also assume that the population standard deviationWas known as 230 grams .
Substituting the values we get µ = 1890 ± Z x (230 / √100 ) for a 90% confidence Z = 1.28 ( refer Z table ) = 1890 ± 1.28 x (230 / √100 ) = 1890 ± 29.44 there is a 90% chance that the actual (µ ) will be contained within 1860.56 to 1919.44 grams.
11-147
FROM THE EXAMPLE JUST EXPLAINED YOU CAN SEE THAT
(sample mean(x-bar) – µ ) = Z
n
n
Error in estimating µ Is a function of
Hence is often referred to as ‘standard error’
n
Hence if error is known then the sample size can beDetermined ( this is based on sampling error alone)
11-148
Sampling without replacement
When we sample without replacement and from a finitePopulation the standard deviation of sample means( also known as standard error ) incorporates a Finite population multiplier and is as follows:
n
√ ( N-n)/(N-1)
Finite population Multiplier ( always ≤ 1)
N = population size n = sample size
It can be noted that as N goes to ∞ the multiplier Becomes = 1 and hence the standard error is the sameAs if the sampling is done with replacement.
11-149Sampling distribution for proportion
Consider an example :
The number of times a hotel is unable to accommodateTheir customer with rooms because the hotel is full.
This can only be expressed in terms of proportion i.e. 10% and so on.
In this case also the sampling distribution of proportionIf the sample size is large behaves like a normal Distribution with expected value of sample proportionEqual to population population and the standard error Equal to √(pq/n) , where ‘q’ = 100-p if ‘p’ in percentage
11-150Sampling distribution for proportion
Similarly interval estimation for proportion can be Found. Example if a sample of 1000 voters selectedAnd 400 of them decided to vote for a political party (X)
Then the proportion of the population that is expectedTo vote for the party (x) would be
P = 40% ±Z √(40*60/1000) = 40% ±3.038 ( Z = 1.96 for 95% confidence level) hence the interval would be 36.92% to 43.04%
11-151Sampling distribution of difference of two means
Consider a group of male employees and a group of Female employees in IT industry at a given level. It isDesired to understand what is the level of differenceIn their salaries ( given there is discrimination )
11-152Sampling distribution of difference of two means
In this situation there could be many possible samples that can be drawn of size (n1) from males and similarlymany samples that can be drawn of size ( n2) from female employees.
For each sample taken from each group the sample mean can be substracted which will give us the levelof difference in the salary.
This is difference of two sample means and the distributionwould also behave like a normal distribution for large sample
11-153Sampling distribution of difference of two means
Thus the sampling distribution for
Difference of two means is normally distributed With expected value (x (male) – x (female) ) = 0 and variance for the difference of two means = ss11
22
nn11
ss2222
nn22
++sStandard
Error
11-154Sampling distribution for small sample
In our earlier discussion we have always emphasizedThe need for a large sample for the sample mean toBe distributed as a Normal Distribution
What is meant by LARGE sample ?
11-155Sampling distribution for small sample
Gossett was working on samples which were Considered as small such as 10, 15, 25 etc and he Found that the sample mean distribution was not Exactly Normal distribution but was neverthelessSymmetric but with large variance. He denotedHis distribution as Student ‘t’ distribution.
Thus the mean value of ‘t’ = 0 and the probabilityDensity function was not only a function of the Mean and variance but also dependent on what he Called as “degree of freedom”
11-156Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Degrees of FreedomDegrees of Freedom• Degrees of Freedom Degrees of Freedom (d.f.) is a parameter based (d.f.) is a parameter based
on the sample size that is used to determine on the sample size that is used to determine the value of the the value of the tt statistic. statistic.
• Degrees of freedom tell how many Degrees of freedom tell how many observations are used to calculate observations are used to calculate , less the , less the number of intermediate estimates used in the number of intermediate estimates used in the calculation.calculation.
= = nn - 1 - 1
11-157Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Degrees of FreedomDegrees of Freedom• As As nn increases, the increases, the tt distribution approaches the shape of the distribution approaches the shape of the
normal distribution. normal distribution. • For a given confidence level, For a given confidence level, tt is always larger than is always larger than zz, so a , so a
confidence interval based on confidence interval based on tt is always wider than if is always wider than if zz were used. were used.
11-158
Degree of freedom
To understand the degree of freedom let usConsider the numbers 1 2 3 total 6
We will have the freedom to change any two out Of these three numbers without change in the total Thus the degree of freedom would be 2Thus degree of freedom would be (n-1) Where n = sample size.
11-159Sampling distribution for small sample
Thus for a small sample and whenever the populationVariance is unknown, the distribution of the sample Means behaves like a ‘t’ distribution .
This ‘t’ distribution becomes very close to Normal Distribution when the degree of freedom is 29 and Above.
Hence the definition for a large sample in statistics Is when the sample size is 30 or more. For smaller than 30 the distribution needed would be‘t’ .
11-160Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Student’s t DistributionStudent’s t Distribution• tt distributions are symmetric and shaped like the standard normal distribution. distributions are symmetric and shaped like the standard normal distribution.• The The tt distribution is dependent on the size of the sample. distribution is dependent on the size of the sample.
11-161Sampling distribution for small sample
Thus for all calculations with small samples Z value will be substituted with ‘t’ values.
Usually small samples are not used when Proportions are involved.
11-162
Confidence Interval for a Mean (Confidence Interval for a Mean () with ) with Unknown Unknown
Use the Use the Student’s t distributionStudent’s t distribution instead of the instead of the normal distribution when the population is normal distribution when the population is normal but the standard deviation normal but the standard deviation is is unknown and the sample size is small.unknown and the sample size is small.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Student’s t DistributionStudent’s t Distribution
xx ++ tt ssnn The confidence interval for The confidence interval for (unknown (unknown ) is) is
xx - - tt ssnn
xx + + tt ssnn
< < < <
11-163Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Student’s t DistributionStudent’s t Distribution
11-164Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Comparison of z and tComparison of z and t• For very small samples, For very small samples, tt-values differ substantially from the normal.-values differ substantially from the normal.• As degrees of freedom increase, the As degrees of freedom increase, the tt-values approach the normal -values approach the normal zz-values.-values.• For example, for For example, for nn = 31, the degrees of freedom are: = 31, the degrees of freedom are:• What would the What would the tt-value be for a 90% confidence interval? -value be for a 90% confidence interval?
= 31 – 1 = 30= 31 – 1 = 30
11-165Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Comparison of z and tComparison of z and t
For For = 30, the corresponding = 30, the corresponding zz-value is 1.645.-value is 1.645.
11-166
Confidence Interval for the Difference Confidence Interval for the Difference of of Two Means, small sample Two Means, small sample 11 – – 2 2
• The procedure for constructing a The procedure for constructing a confidence interval for confidence interval for 11 – – 22 depends on our assumption about depends on our assumption about the unknown variances.the unknown variances.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Assuming equal variances:Assuming equal variances:
(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22
22
nn11 + + nn22 - 2 - 211nn11
11nn22
++
with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) degrees of freedom – 1) degrees of freedom
11-167
Confidence Interval for the Difference Confidence Interval for the Difference of of Two Means, small sample Two Means, small sample 11 – – 2 2
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Assuming equal variances:Assuming equal variances:
(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22
22
nn11 + + nn22 - 2 - 211nn11
11nn22
++
with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) – 1)
degrees of freedomdegrees of freedom Pooled standarddeviation
11-168
Confidence Interval for the Confidence Interval for the Difference of Difference of Two Means, small sample Two Means, small sample 11 – – 2 2
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Assuming equal variances:Assuming equal variances:
(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22
22
nn11 + + nn22 - 2 - 211nn11
11nn22
++
with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) – 1)
degrees of freedomdegrees of freedom StandardError for DifferencesOf means
11-169
F- Distribution
F – distribution ( Fisher’s ) is the ratio of the variations.
If two samples are drawn and we wish to know Whether the samples are drawn from a single populationOr from two separate population, then an F- Statistic Is calculated.
This F- Statistic = ratio of samples variances of the two samples. ( )S1
2 / S22
11-170
F – distribution curve
F – Distribution is a probability density function whose shape of the curve is as follows:
y
F- Statistic
11-171
F- distribution
We will have more occasions to talk about This F- statistic later while discussing Hypothesis testing.
11-172
11-173
Chi-Square - Distribution
Chi-Square Distributed ( ) is a distributionWhen we wish to estimate the population varianceFrom a known sample variance.
Similarly there are many non parametric tests Where we would use a Chi-Square tests.
The shape of the Chi-square distribution varies withThe degree of freedom
11-174
Chi-Square distribution
Chi-square statistic
y
11-175
Chi-square distribution
Square of the Z distribution behaves like aChi-Square distribution.
Similarly a sum of the square of several NormalDistribution also behaves like a Chi-Square Distribution.
We will have more to talk about this distributionWhen we look at hypothesis testing.
11-176
11-177Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
• Begin the analysis of Begin the analysis of bivariate databivariate data (i.e., (i.e., two variables) with a two variables) with a scatter plotscatter plot..
• A scatter plot A scatter plot - displays each observed data pair (- displays each observed data pair (xxii, , yyii) as a dot on an x-y grid) as a dot on an x-y grid
• indicates visually the strength of the indicates visually the strength of the relationshi between the two variablesrelationshi between the two variables
.
Visual DisplaysVisual Displays
11-178Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Visual DisplaysVisual Displays
cost of maintenance per month
0
5000
10000
15000
20000
25000
0 5 10 15 20 25 30 35
hrs driven per week
main
ten
an
ce c
ost
11-179Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Correlation AnalysisCorrelation Analysis
Strong Positive Strong Positive CorrelationCorrelation
Weak Positive Weak Positive CorrelationCorrelation
11-180Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Correlation AnalysisCorrelation Analysis
Weak Negative Weak Negative CorrelationCorrelation
Strong Negative Strong Negative CorrelationCorrelation
11-181Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Correlation AnalysisCorrelation Analysis
No CorrelationNo Correlation
Nonlinear RelationNonlinear Relation
11-182Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
• The The sample correlation coefficientsample correlation coefficient ( (rr) ) measures the degree of linearity in measures the degree of linearity in the relationship between the relationship between XX and and YY..
-1 -1 << rr << +1 +1
• rr = 0 indicates no linear relationship = 0 indicates no linear relationship
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Correlation AnalysisCorrelation Analysis
Strong negative relationshipStrong negative relationship Strong positive relationshipStrong positive relationship
11-183Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Correlation AnalysisCorrelation Analysis
Cov ( x,y)= ------------- s(x) s(y)
11-184
Correlation coefficient can also be found As follows:
(n(∑xy) - ( ∑x ∑y)
√{(n∑x2 ) – (∑x)2 } x {(n∑y2) –(∑y)2} r =
11-185
11-186
Use of Excel for finding the correlation
11-187
Use of Excel for finding the correlation
11-188
Use of Excel for finding the correlation
11-189
Use of Excel for finding the correlation
11-190
Correlation coefficient can also be found As follows:
(n(∑xy) - ( ∑x ∑y)
√{(n∑x2 ) – (∑x)2 } x {(n∑y2) –(∑y)2} r =
0.9322
Hence r^2 = 0.869
11-191Properties of correlation coefficient
1. The value of ‘r’ always varies between -1 to +1
2. The change of origin and scale does not effect the value of the coefficient
( what this means is as follows)
11-192Change of origin and scale means
11-193Properties of correlation coefficient
1. The value of ‘r’ always varies between -1 to +1
2. The change of origin and scale does not effect the value of the coefficient
3. If ‘x’ and ‘y’ are interchanged the coefficient is not effected. i.e. it remains unaltered. ( we usually refer to ‘x’ as independent variable ‘y’ as dependent variable
4. the fourth property can be explained after we explain regression ( hence hold till such time )
11-194
Bivariate RegressionBivariate Regression
• Bivariate Regression Bivariate Regression analyzes the analyzes the relationship between two variables.relationship between two variables.
• It specifies one It specifies one dependentdependent ((responseresponse) variable and one ) variable and one independentindependent ( (predictorpredictor) variable.) variable.
• This hypothesized relationship may This hypothesized relationship may be linear, quadratic, or whatever.be linear, quadratic, or whatever.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
What is Bivariate Regression?What is Bivariate Regression?
11-195
Bivariate RegressionBivariate Regression
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Chart Title
-5000
0
5000
10000
15000
20000
25000
0 5 10 15 20 25 30 35
hrs of vehicle driven
co
st
of
main
ten
an
ce
11-196
Bivariate RegressionBivariate Regression
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Chart Title
-5000
0
5000
10000
15000
20000
25000
0 5 10 15 20 25 30 35
hrs of vehicle driven
co
st
of
main
ten
an
ce
In the equation y= a + bx how to find the value of ‘a’ and ‘b’ which are the intercept and slope.
11-197
Bivariate RegressionBivariate Regression
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
In the equation y= a + bx how to find the value of ‘a’ and ‘b’ which are the intercept and slope.
11-198
How to develop a Regression line
11-199
How to develop a Regression line
11-200
Normal Equations
∑ Y = na + b∑X ∑XY = a∑X + b∑X2
Which when simplified becomes (n∑ XY) – (∑X ∑Y) b= -----------------------
(n∑X^2 ) - (∑X)^2
a= Y(bar) – b X ( bar )
11-201
Normal Equations
For the problem considered earlier
b = 777.32 ( y – dependent variable x – independent variable)
And a= -6115.9
11-202
Normal Equations
If we had desired the normal equations for the Situation when ‘x’ = dependent variable and ‘y’ = independent variable
The normal equations would simply change so that Where there ‘x’ replace with ‘y’ and replace ‘y’ with ‘x’ You will find the numerator would remain unchangedBut the denominator would be
(n∑X^2 ) - (∑X)^2
11-203
Normal Equations
Hence the new a’ = 9.393
b’ = 0.001118
when x = dependent variable y= independent variable
11-204Properties of correlation coefficient
1. The value of ‘r’ always varies between -1 to +1
2. The change of origin and scale does not effect the value of the coefficient
3. If ‘x’ and ‘y’ are interchanged the coefficient is not effected. i.e. it remains unaltered. ( we usually refer to ‘x’ as independent variable ‘y’ as dependent variable
4. b x b’ = r^2 which is 777.32 x 0.001118 = 0.869
11-205
11-206
From here it follows:
1. Both the regression coefficients must have the same sign ( either + or - )
2. If one regression coefficient is greater than 1 then the other regression coefficient must be < 1 .
3. If one regression coefficient is < 1 then the other regression coefficient may be > or < than 1.
11-207
Regression TerminologyRegression Terminology
• Step 1: Step 1: - Highlight the data columns.- Highlight the data columns.- Click on the Chart Wizard and choose - Click on the Chart Wizard and choose ScatterScatter Plot Plot- In the completed graph, click once on the - In the completed graph, click once on the pointspoints in the scatter plot to select the data in the scatter plot to select the data- Right-click and choose Add Trendline- Right-click and choose Add Trendline- Choose Options and check Display Equation- Choose Options and check Display Equation
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Fitting a Regression on a Scatter Plot in ExcelFitting a Regression on a Scatter Plot in Excel
11-208
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Fitting a Regression on a Scatter Plot in ExcelFitting a Regression on a Scatter Plot in Excel
11-209
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Fitting a Regression on a Scatter Plot in ExcelFitting a Regression on a Scatter Plot in Excel
11-210
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Fitting a Regression on a Scatter Plot in ExcelFitting a Regression on a Scatter Plot in Excel
11-211
Regression TerminologyRegression Terminology
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
11-212
11-213
WHAT IS A HYPOTHESIS
A hypothesis is a conjectural statement about the a Certain characteristic in the whole population or target Segment.
Ex: What is the average expenditure per month incurred on vehicle maintenance. If it is suggested that this average is Rs.1200 per month, then this will be a hypothesis. What this implied is that if we take all the people who drive their Vehicle and find the expenditure of everyone and average the resultsIt would be Rs. 1200 per month.
11-214
WHAT IS A HYPOTHESIS
A hypothesis is a conjectural statement about the a Certain characteristic in the whole population or target Segment.
Ex: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol.
This implied that for the given model, if we take the average of All the vehicles and find its mileage per litre of petrol, it would be At least 15 kms to a litre of petrol.
11-215
WHAT IS A HYPOTHESIS
A hypothesis is a conjectural statement about the a Certain characteristic in the whole population or target Segment.
Ex: An exporter claims that the proportion of defects in his consignment will be at most 2% .
This means that if we take all his consignments and find the proportionOf defects, the average defectives will not exceed 2%.
11-216
WHAT IS A HYPOTHESIS
A hypothesis is a conjectural statement about the a Certain characteristic in the whole population or target Segment.
Ex: A refill manufacturer for ball point pens claims that the length of the refill on an average is 140 mm.
This would imply that while each of the refill’s may ( or cannot) Be exactly 140 mm but on an average the length of refills would be 140 mm.
11-217What do all these hypothesis show?
1. They are all conjectures about the population parameter.
2. They talk always about the population parameter
3. They are statements made only on the basis of the research question in hand and not on the basis of the data collected.
11-218Why do we need to do hypothesis testing
While we want to verify the statement specified in the Hypothesis, it would be impossible to do so without doingA census. Hence if a census is carried out then Hypothesis testing is not essential.
However we know that a census is not practical and also Need not be accurate. Hence we need to comment on theHypothesis on the basis of sample information only.
Hence we draw inference about the population parameterBased on sample information, this drawing of inference isCalled hypothesis testing.
11-219Characteristics of a good hypothesis
1. hypothesis should be based on sound previous research
2. look for realistic explanations
3. state the variables clearly
4. it should be easily amenable to test
5. measure the variables in the correct scale
11-220
Basics of hypothesis testing
As we said before the hypothesis is about the characteristicsAbout the population. We usually call it the ‘parameter’
Parameter can only be obtained by doing a censusWhich is not possible or not practical.
Hence our inference about the parameter is based on the Sample information. Hence based on a sample information we may reject a true Hypothesis and conversely we may accept a hypothesis When it is actually not true. Both of these are errors in making the inference.
11-221
Basics of hypothesis testing
Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol
The problem here is that if the average is more than 15 km/litre then we are satisfied but if it gives less than thanWhat is claimed by the dealer then we have difficulty in Believing the claim of the dealer. Hence this is what we Wish to verify or infer. This inference must be based on The information available from a single sample of size ‘n’This ‘n’ can be either 30 cars or 40 cars or even about 20 cars.
11-222
Basics of hypothesis testing
Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol
Hence we can write the hypothesis as follows:
Null Hypothesis :Generally we not disagree with the dealer to begin with unless there is sufficient evidence to disagree. Hence we write Null hypothesis: ( Ho ): µ ≥ 15Alternative hypothesis: ( Ha ): µ < 15
11-223
Basics of hypothesis testing
Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol ( Ho ): µ ≥ 15 ( Ha ): µ < 15
11-224
Approaches for hypothesis testing
11-225
Errors in hypothesis testing
Hypothesis true
Hypothesis false
Accept hypothesis
No error Type II error or ß- error
Reject hypothesis
Type I error or alpha error
No error
11-226
Steps in Hypothesis Testing
1. Based on the research question develop Null and Alternative hypothesis
2. Decide on the level of Type I Error or alpha error.
3. Decide whether the test is a single tail or a two tail
11-227
Steps in Hypothesis Testing
4. Decide on the appropriate Test Statistic which will be used ( Z or t or any other )
5. Calculate the test statistic.
11-228
Steps in Hypothesis Testing
6. Read the test statistic for the level of type I error from The table of Z or ‘t’ etc. 7. Compare the calculated test statistic with that of the table value.
8. Make a conclusion.
11-229
Worked Example -1:
A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.
Step 1: Formulate the null and alternative hypothesis:
Ho: µ ≤ 30 Ha: µ > 30
11-230
Worked Example -1 contd:
A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.
Step 2: Decide on the level of significance or type I error
This is given in the problem as 5%
Step 3: Looking at the hypothesis it is clear that it is single tail and the problem area is to the right hence right tail.
11-231
Worked Example -1 contd:
A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.
Step 4: decide on the test statistic. Since sample size is > 30 we can go for the Z statistic.
Step 5: Calculate Z statistic = (x-bar - µ )/ S.E. ( standard error ) = (36 – 30 )/ (7 / √49) = 6
11-232
Worked Example -1 contd:
A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.
Step 5: Calculate Z statistic = (x-bar - µ )/ S.E. ( standard error ) = (36 – 30 )/ (7 / √49) = 6 Step 6: The Z ( table value at 5% level of alpha), single tail = 1.645.
Step 7. Since the Z (cal) > Z ( table value ) hence reject or unable to accept the null hypothesis
11-233
Worked Example -1 contd: ( single mean)
A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.
Step 7. Since the Z (cal) > Z ( table value ) hence reject or unable to accept the null hypothesis
Step 8. Conclusion: There appears to be a delay in the shipping time in recent times.
11-234
Worked example :-2 ( single mean)
Let us consider the car example: The dealer claims that a particularmodel gives at least 15Km/litre of fuel. A random sample of 36 carsgives a mean of 14.6 km/litre and a population standard deviation is assumed known as 0.75 km/litre. Assume 5% level of significance
Step 1. Ho: µ ≥ 15Ha: µ < 15
Step 2: The significance level is given as 5% .Step 3: This is also a single tail test but the direction is towards the left. Step 4: Since the sample size is large ( 36) we can use the Z – test.
11-235
Worked example :-2 ( single mean)
Let us consider the car example: The dealer claims that a particularmodel gives at least 15Km/litre of fuel. A random sample of 36 carsgives a mean of 14.6 km/litre and a population standard deviation is assumed known as 0.75 km/litre. Assume 5% level of significance
Step 4: Since the sample size is large ( 36) we can use the Z – test. Step 5: Calculate the Z – statistic Z = (14.6 – 15 ) / ( 0.75/ √36) = -0.4 / (0.75/6) = -3.2Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645.
11-236
Worked example :-2 ( single mean)
Let us consider the car example: The dealer claims that a particularmodel gives at least 15Km/litre of fuel. A random sample of 36 carsgives a mean of 14.6 km/litre and a population standard deviation is assumed known as 0.75 km/litre. Assume 5% level of significance
Step 5: Calculate the Z – statistic Z = (14.6 – 15 ) / ( 0.75/ √36) = -0.4 / (0.75/6) = -3.2Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645. Step 7. Compare Z calculated with Z table value for a left tail test the rule is if Z calculated ≤ Z table value reject Ho else accept Ho.
11-237
Worked example :-2 ( single mean)
Let us consider the car example: The dealer claims that a particularmodel gives at least 15Km/litre of fuel. A random sample of 36 carsgives a mean of 14.6 km/litre and a population standard deviation is assumed known as 0.75 km/litre. Assume 5% level of significance
Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645. Step 7. Compare Z calculated with Z table value for a left tail test the rule is if Z calculated ≤ Z table value reject Ho else accept Ho. In this Z(cal) = - 3.2 < Z ( table value) -1.645 hence reject Ho. Step 8. Conclusion is that the dealer claim cannot be accepted
11-238
Worked Example -3: ( single mean)
A refill manufacturer claims that the refill length for a ball Point pen is 140 mm long. A sample of size 100 is selected and finds that the mean length of refills is 141.77mm with a standard deviation of 5.88 mm. At 5% level of significanceCan it be concluded that the refills are of poor quality.
Step1: Null hypothesis and alternative hypothesis:
Ho: µ = 140 mmHa: µ ≠ 140 mm
Step 2: Alpha level is given at 5% level.
Step 3. In this case it is a two tail test as both sides are not acceptable because the refill will not fit in the pen and hence poor quality.
11-239
Worked Example -3 contd: ( single mean)
A refill manufacturer claims that the refill length for a ball Point pen is 140 mm long. A sample of size 100 is selected and finds that the mean length of refills is 141.77mm with a standard deviation of 5.88 mm. At 5% level of significanceCan it be concluded that the refills are of poor quality.
Step 4. Since the sample size is 100 which is large hence Z test can be done.
Step 5. Calculate the Z statistic : Z= (141.77 – 140) / ( 5.88 / √100) = 1.77 / 0.588 = 3.01
Step 6. Read the Z statistic from table at 5% two tail = 1.96
11-240
Worked Example -3 contd: ( single mean)
A refill manufacturer claims that the refill length for a ball Point pen is 140 mm long. A sample of size 100 is selected and finds that the mean length of refills is 141.77mm with a standard deviation of 5.88 mm. At 5% level of significanceCan it be concluded that the refills are of poor quality.
Step 5. Calculate the Z statistic : Z= (141.77 – 140) / ( 5.88 / √100) = 1.77 / 0.588 = 3.01
Step 6. Read the Z statistic from table at 5% two tail = 1.96Step 7: Z ( cal ) > Z ( table value) => hence reject Null hypothesis. Step 8. Conclusion: The refills produced are or poor quality
11-241Rule for rejecting a Null hypothesis
For right tail test ( single tail ): If Z ( cal ) ≥ Z ( table value ) Reject Ho
For a left tail test ( single tail) : If Z ( cal ) ≤ Z ( table value ) Reject Ho
For a two tail test : If |Z ( cal )| ≥ |Z (table value)| Reject Ho
It can be observed that even for a single tail Test if we consider the modulus value of Z then The same rule as that for two tail test can be Used for rejecting Ho ( use caution here )
11-242
When population σ is unknown
While conducting hypothesis testing, usually populationσ is unknown. Under these situations the sample standard deviation ( s ) is used instead of σ andHence the standard error would be = ( s/√ n).It must be made sure that sample standard deviation Must be calculated with ( n-1) in the denominator As stated in the earlier subject DRM 01 as only thenIt becomes unbiased estimator of ‘σ’. Further it must be ensured that the sample size should Be large ( definition of large was n > 30)
11-243
When sample size is small <30 When the sample size is < 30, it is considered as a Small sample and hence ‘t’ distribution should be usedInstead of ‘z’ . Further if the population standard deviationIs unknown, then also it is recommended that ‘t’ Distribution is used. Hence the only change is :
(Z) statistic = (x-bar - µ )/ S.E. ( standard error)
Replace with ‘t’
11-244Worked example – single proportion
Insurance companies have recently created difficulty in settling medicalClaims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up?Use 5% level of significance.
Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%Step 3: this is a single tail ( right tail ) test.
11-245Worked example – single proportion
Insurance companies have recently created difficulty in settling medical Claims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up? Use 5% level of significance.
Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%
Step 3: this is a single tail ( right tail ) test.
Step 4. Z statistic will be used as sample size is largeStep 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n) = (7%-5%)/√(5%x95%)/100)
11-246Worked example – single proportion
Insurance companies have recently created difficulty in settling medical Claims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up? Use 5% level of significance.
Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%
Step 3: this is a single tail ( right tail ) test. Step 4. Z statistic will be used as sample size is largeStep 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n) = (7%-5%)/√(5%x95%)/100)
= 2/ (2.179) = 0.91Step 6. Z table value at 5% single tail = 1.645Step 7 : Compare Z cal with Z table value
11-247Worked example – single proportion
Insurance companies have recently created difficulty in settling medical Claims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up? Use 5% level of significance.
Step 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n)
= (7%-5%)/√(5%x95%)/100) = 2/ (2.179) = 0.91Step 6. Z table value at 5% single tail = 1.645Step 7 : Compare Z cal with Z table value hence Z cal < Z table value Hence accept Ho
Step 8 : We cannot conclude that the number of false claims have gone beyond 5%. Even though sample shows a 7%.
11-248Hypothesis test for difference of two means
Let us consider the following situations:
Case-1: Does going to VLCC help in reducing weight ?
Case -2: Does the company always assess the rent for the residential quarter less than the employee himself?
Case-3: Is their gender discrimination among employers in a given industry for the same level of job?
Case -4: Is the new drug more effective in treatment of a disease than the existing drug?
11-249
In all cases we are taking about the difference of Two mean. Case-1: The mean before joining VLCC and the mean after joining VLCC (µbefore -µ after )
Case 2: The mean value of residential quarter assessed by the company and mean value of residential quarter assessed by the employee for whom it is meant. (µ(company) - µ ( employee)
Case 3: The mean wage given to women employees and the mean wage given to men employees (µ (women) - µ ( men )
Case 4: The mean time to recover with new drug and mean time to recover with existing drug. µ (new drug ) - µ ( old drug )
11-250
Despite each of these cases being a difference ofTwo means; there is one essential difference:-
Case 1 & 2:
In both these cases we are talking about the sameSample . Case 1: - Same sample weight before joining VLCC same sample weight after joining VLCC
Case 2: Same house assessed by Company Same house assessed by employee
11-251
Despite each of these cases being a difference ofTwo means; there is one essential difference:-
In Cases 3 & 4:-
Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawn
Case 4: mean time of recovery using new drug mean time of recovery using existing drug each sample is independently drawn. ( obviously the same patient cannot be both the drugs )
11-252
Difference of two means
Difference of two means
Dependent Sample
Samples independently drawn
Both situations are treated differently.
11-253Difference of two means – dependent samples
Ex: It was intended to understand whether there is any difference in the productivity of a worker immediately after a weekly off or immediately before a weekly off. If the weekly off was on a Sunday, it was desired to find if the productivity is different on Saturday or on a Monday. Hence productivity was measured on Saturday’s and Monday’s for the same set of workers. The data are as follows: ( use 5% level of significance) Worker Id Productivity Sat Mon 1 25 28 2 32 29 3 20 29 4 26 36 5 29 35 6 21 30 7 18 32 8 17 24 9 27 25
11-254
Dependent sample case:
Step 1: To develop a Null and alternative hypothesis Ho: µ (sat) = µ (mon) Ha: µ (sat) ≠ µ (mon) the problem does not suggest that productivity is more on Saturday’s than on Monday’s or vice-versaHence it could either way. Hence ≠ symbol in the Alternative hypothesis:
This also implies: Ho: µ (sat) - µ (mon) = 0 or difference=0 Ha: µ (sat) - µ (mon) ≠ 0 or difference ≠0
11-255Difference of two means – dependent samples
Worker Id Productivity Sat Mon difference 1 25 28 -3 2 32 29 3 3 20 29 -9 4 26 36 -10 5 29 35 -6 6 21 30 -9 7 18 32 -14 8 17 24 -7 9 27 25 2
average difference = - 5.88 sample standard deviation(s) = 5.622
11-256
Dependent sample case:
Step 2: Level of alpha is given as 5%
Step 3: This is a two tail test
Step 4: The test statistic will be a ‘t’ distribution as the sample size is small and also the ‘σ’ is unknown. t (cal) = diff – 0 / ( s/√ n) = (- 5.88 – 0) / (5.622/ √9 ) = -5.88 / 1.874 = -3.13 or modulus = 3.13Step 5: find the table value of ‘t’ for 5% two tail with a degree of freedom of 8 ( n-1) = 2.306.
11-257
Dependent sample case: Step 2: Level of alpha is given as 5%Step 3: This is a two tail test Step 4: The test statistic will be a ‘t’ distribution as the sample size is small and also the ‘σ’ is unknown. t (cal) = diff – 0 / ( s/√ n) = (- 5.88 – 0) / (5.622/ √9 ) = -5.88 / 1.874 = -3.13 or modulus = 3.13Step 5: find the table value of ‘t’ for 5% two tail with a degree of freedom of 8 ( n-1) = - 2.306. Step 6: Compare t(cal) with t(Table value) If |t(cal)|> |t(table value) reject Ho hence 3.13 > 2.306 Step 7: Reject HoStep 8 Conclusion : There is a change in the productivity between before a weekend and after a weekend.
11-258
Dependent sample case:
Hence it is clear that in the case of a difference Of two means – dependent sample case Is treated as if it is a single mean case;
Further data is always obtained in pairs. And the sample sizes are usually less than 30 Hence a ‘t’ is normally used.
This test is also called Paired ‘t’ test
11-259
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Consider case 3 and 4 discussed earlier. Reproducing Below both the cases for ready reference
Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawn
Case 4: mean time of recovery using new drug mean time of recovery using existing drug each sample is independently drawn. ( obviously the same patient cannot be both the drugs )
11-260
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Or depending on the problem it could have been a singleTail test. One tail or two tail depends on the ResearchQuestion being addressed.
What is the implication of acceptance of the Null Hypothesis?
It means that both groups belong to the same populationi.e. there is only population and hence one mean and variance
11-261
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Or depending on the problem it could have been a singleTail test. One tail or two tail depends on the ResearchQuestion being addressed.
What if Null hypothesis is rejected or not accepted?
This would imply that all men belong to a population and all womenBelong to a different population and since there are two population;There are also two different means and the variance may be eitherEqual or unequal.
11-262
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)
Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0
Writing the above hypothesis is Step 1.
Now we collect samples of the two groups and find out Their wages. Average wage for females and for men areSeparately calculated and also sample variances.
11-263
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
In this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0Writing the above hypothesis is Step 1.
Now we collect samples of the two groups and find out Their wages. Average wage for females and for men areSeparately calculated and also sample variances.
Step2: Decide on the level of significanceStep3: Decide whether single tail or two tail testStep4: Decide on the test statistic. Z for large sample and ‘t’ for small sample.Step5: Calculate Z statistic = {x-bar(women)-x-bar(men) - µ(women) - µ(men) } / S.Error Recall that in the course on sampling methods we have indicated the standard error for the difference of two means- independent case.
11-264
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
In this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0Writing the above hypothesis is Step 1. Step2: Decide on the level of significanceStep3: Decide whether single tail or two tail testStep4: Decide on the test statistic. Z for large sample and ‘t’ for small sample.Step5: Calculate Z statistic = {x-bar(women)- x-bar(men) - µ(women) - µ(men) } / S.Error Recall that in the course on sampling methods we have indicated the standard error for the difference of two means- independent case. Step 6: Find Z for alpha level of significance from tableStep 7: Compare Z (cal) with Z(alpha) : If Z (cal) ≥ Z(alpha) reject Ho
Step 8: Conclude your result.
11-265
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that of marketing was 1.87. Step 1: Ho : µ(mktg) = µ(HR) µ(mktg) - µ(HR) = 0 Ha: µ(mktg) ≠ µ(HR) µ(mktg) - µ(HR) ≠ 0
11-266
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that
of marketing was 1.87. Step 1: Ho : µ(mktg) = µ(HR) µ(mktg) - µ(HR) = 0 Ha: µ(mktg) ≠ µ(HR) µ(mktg) - µ(HR) ≠ 0
Step2: Alpha level is specified as 5%
Step 3: This is a two tail test
11-267
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that
of marketing was 1.87.
Step 6: Calculate Z = (6.23-5.66)/ Standard error
11-268
Recall that Standard error for difference of two meansIndependent sample case is
2
22
1
21
nn
If population variance ‘σ’ is unknown use Unbiased estimator ‘s’ – sample variance.
11-269
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that
of marketing was 1.87.
Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486
11-270
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that
of marketing was 1.87. Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486
Step 7: Z table value ( two tail ) 5% alpha = 1.96 Hence Z ( cal) > Z ( table value) Reject Ho. 2.486 > 1.96 reject Ho
11-271
DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE
Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that
of marketing was 1.87. Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486Step 7: Z table value ( two tail ) 5% alpha = 1.96 Hence Z ( cal) > Z ( table value) Reject Ho. 2.486 > 1.96 reject HoStep 8: Conclusion: Stress levels are different for marketing and HR.
11-272
11-273
Difference of two means – independent sample small sample
If the null hypothesis is accepted, it would implythat both group sample came from the same populationand hence for the one population there can be only one mean and one variance.
However, if the null hypothesis is rejected it implies thatBoth group sample belongs to different population andTherefore for each of the population there will be Different mean. But we assume that the two groupsHas the same variance. That is homogeneity of variancesIs assumed. When such assumption is made the standardError can be recalled as :
11-274
Difference of two means – independent sample small sample
Recall that in the case of small sample for independentSamples case; the standard error was calculated Using the pooled estimates as follows:
standard error = s(pooled)√{1/n1+1/n2)
Where s^2(pooled) = {(n1-1)s1^2} +{(n2-1)s2^2}
Or s(pooled) = √ s^2 (pooled)
(n1+n2-2)
( homogeneity of variance)
11-275
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows:
Brand A: 38, 37, 42, 44, 36, 39, 40, 41
Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step -1: Ho: µ(a) = µ(b) Ha: µ(a) ≠ µ(b)
Step 2: Level of significance is known as 5%
11-276
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step -1: Ho: µ(a) = µ(b) Ha: µ(a) ≠ µ(b)Step 2: Level of significance is known as 5%
Step 3: This is a two tail test as the question is only asking if the life of the two brands of batteries are different.
11-277
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step 4: Choosing the test statistic. Since the sample size is 8 and 11 respectively ( small ) a t-statistic is used to infer the hypothesis.
Step 5: Calculate the ‘t’ statistic: t= (mean for B/A – mean for B/B)- 0 standard error
11-278
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step 5..contd: mean life for brand A = 39.625 mean life for brand B = 42.18182 variance for brand A = 7.125 variance for brand B = 11.3636 pooled variance = 9.6183 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742
11-279
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step 5 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742 absolute value = 1.7742Step 6: tabulated value of ‘t’ for 5% alpha at 17 df = 2.1098
Step 7: Compare absolute values: ‘t’ (cal) < ‘t’(tablulated) hence unable to reject the null hypothesis
11-280
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
Step 5 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742 , absolute value = 1.7742Step 6: tabulated value of ‘t’ for 5% alpha at 17 df = 2.1098Step 7: Compare absolute values: ‘t’ (cal) < ‘t’(tablulated) hence unable to reject the null hypothesis
Step 8: Conclusion: Hence the mean life of batteries of both the brands are similar and hence both vendors can be considered for selection based on other considerations such as price, delivery etc.
11-281
Worked example: independent small sample ‘t’ - test
A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.
One of the assumptions made to solve this problem Is that the population variances are equal even ifThe alternative hypothesis is accepted. However, weHave not checked this aspect. Hence it is necessaryTo do this check this aspect which we shall take up Now.
11-282
CHECKING FOR HOMOGENEITY OF POPULATION VARIANCE
HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW:
STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2
Step 2: Decide the level of significance : assume 5%
Step 3: this is a two tail test based on the alternative hypothesis
Step 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test:
Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627
11-283
CHECKING FOR HOMOGENEITY OF POPULATION VARIANCE
HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW: STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2Step 2: Decide the level of significance : assume 5%Step 3: this is a two tail test based on the alternative hypothesisStep 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test: Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627
Step 6: Read the table value for ‘F’ from the table . This requires degree of freedom for the numerator and denominator which is (n1-1) and (n2-1) i.e. 7 and 10 respectively.
11-284
F- table – how to read.
The value of F when the blue shaded portionIs 0.975 , we take the reciprocal of F valueOf 0.025 with degree of freedom interchangedHence F ( 0.025) with 10,7 df = 4.76And 1/4.76 = 0.21.
11-285
11-286
CHECKING FOR HOMOGENEITY OF POPULATION VARIANCE
HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW: STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2Step 2: Decide the level of significance : assume 5%Step 3: this is a two tail test based on the alternative hypothesisStep 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test: Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627Step 6: Read the table value for ‘F’ from the table . This requires degree of freedom for the numerator and denominator which is (n1-1) and (n2-1) i.e. 7 and 10 respectively. Step 7: Now we can see that ‘F’ statistic calculated is in between table value of 0.21 and 3.95. Accept Ho. Step 8 : hence homogeneity of variance is established.
11-287
11-288
Difference of two proportions
A candidate who was interested in filing his nomination papers for anElection wanted to understand whether his popularity in two adjacentConstituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency.
In this problem we would not be able to check the Difference of two means but difference of two proportions. Two independent samples will be drawn from the two Constituency and find out how many support his candidature.
We can take a worked example to explain this test.
11-289
Difference of two proportions
A candidate who was interested in filing his nomination papers for anElection wanted to understand whether his popularity in two adjacentConstituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples wereDrawn from each constituency A and B and preference for his candidaturewas measured by a survey. In constituency A – sample size 800 and 390Favored him and in Constituency B – sample size 900 and 490 favored himCan we say that Constituency B is favorable for this candidate. Use 5% Level of significance.
Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be used
11-290
Difference of two proportions
A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a)
Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be used
Step 5: Calculate Z = {p(b)-p(a) }-0 / standard error for difference of two proportion
11-291
Difference of two proportions
That the standard error for the difference of two proportionsIs given by
pp11(1 - (1 - pp11) + ) + pp22(1 - (1 - pp22))
nn11 nn22
Hence in this problem p(a) = 390 / 800 = 0.4875 or 48.75% p(b) = 490/900 = 54.44%
Hence standard error = √{48.75x51.25/800} x {(54.44x45.56/900} = 2.4246
11-292
Difference of two proportions
A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be usedStep 5: Calculate Z = {p(b)-p(a) }-0 / standard error for porportion = (54.44-48.75) -0 / 2.4246 = 2.347Step 6: Read the table value of Z at 5% alpha, single tail = 1.645Step 7: Compare Z ( cal) with Z ( table value ) 2.347> 1.645 reject Ho.
11-293
Difference of two proportions
A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be usedStep 5: Calculate Z = {p(b)-p(a) }-0 / standard error for porportion = (54.44-48.75) -0 / 2.4246 = 2.347Step 6: Read the table value of Z at 5% alpha, single tail = 1.645Step 7: Compare Z ( cal) with Z ( table value ) 2.347> 1.645 reject Ho. Step 8: Conclusion: Constituency B is more popular than A for this candidate.
11-294
Analysis of variance
Consider the following research question: Three different types of seeds are sown in a exactly similarTypes of soil and the same type of fertilizer is added for Each of the type of plants. The yield for the product is As follows: Yield ( million tons) seed A seed B Seed C plot-1 8 12 9 plot-2 9 10 10 plot-3 10 10 9 plot-4 9 13 8 plot-5 9 11 8
If all the seed varieties are similar they should givenSimilar yield and if some of them are superior thenOne or more type would give a larger yield.
11-295
Analysis of variance
Consider the following research question: Three different types of seeds are sown in a exactly similar Types of soil and the same type of fertilizer is added for Each of the type of plants. The yield for the product is As follows: Yield ( million tons) seed A seed B Seed C plot-1 8 12 9 plot-2 9 10 10 plot-3 10 10 9 plot-4 9 13 8 plot-5 9 11 8
Step 1: Ho: µa = µb = µc Ha: at least two means are unequal In this case a direct comparison is not feasible as which two can be compared. You need 3 different comparisons A with B , A with C and B with C. In this case the type I error would be very large . HenceWe must adopt another method.
11-296
Basics of ANOVA
11-297
Variance calculation
A) Calculate the correction factor (c/f)= GT^2/Total sampleB) Calculate the total sum of squares ( TSS) = square each value and sum it up – c/fC) Calculate sum of squares between samples (SSB) = (total for seedA)^2/n(a) + (total for seedB)^2/n(B)+ (total for seedC)^2/n(c) – c/f In the problem stated above the values for each of these Are as follows: c/f = (145^2/15)= 1401.66 TSS= 1431-1401.66= 29.34 SSB= 1419.4-1401.66 = 17.74Now we can construct the ANOVA table
11-298
ANOVA table ( step 2)
Source of variation
Sum of squares
Degree of freedom
Mean square
F (cal) F ( table value)
Between treatment
SSB K-1 MSB= SSB/df
MSB/ MSW
Within treatment
SSW n(a)+n(b)+n(c)-k
MSW = SSW/df
Total TSS= SSB+SSW
n(a) + n(b) + n(c)-1
11-299
ANOVA table ( filled in table )
Source of variation
Sum of squares
Degree of freedom
Mean square
F (cal) F ( table value)
Between treatment
17.74 2 8.866 9.178 6.93
Within treatment
11.6 12 0.966
Total 29.34 14
Mean Square is the variance .Hence MSB =variance between seeds MSW= variance within seeds
11-300
ANOVA
Step 2: Level of alpha is to be decidedStep 3: This is always a single tail test. Step 4: Since ratio of variances are being considered hence it is an ‘F’ – test of Fisher’s test. Step 5: Calculate the F( cal) as stated earlierStep 6: Read the F – table value for alpha level and df for between and withinStep 7: Compare F(cal) with F(table value) If F(cal) ≥ F(table value ) Reject HoStep 8: Conclusion.
11-301
ANOVA table ( filled in table ) Source of variation
Sum of squares
Degree of freedom
Mean square
F (cal) F ( table value)
Between treatment
17.74 2 8.866 9.178 6.93
Within treatment
11.6 12 0.966
Total 29.34 14
Hence F(cal) > F( table value) Reject HoConclusion: The three types of seeds do not give the same yield.
11-302Significance testing for correlation coefficient
Let us recall from the previous course on Sampling methodsWhere we had calculated the correlation coefficient basedOn sample information. The sample information contained only a few values for The independent variable and a corresponding few valuesFor the dependent variable. Hence if the correlation existsFor these values, then how can we be sure that if all theValues in the population are known, then a correlation willExist. To answer this question, it is necessary to test The significance of the correlation coefficient. The procedure is discussed below:
11-303Significance testing for correlation coefficient
Step 1. Define the Null and Alternative hypothesis: Ho: ρ = 0 Ha: ρ ≠ 0 ρ = population correlation
Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test.
11-304Significance testing for correlation coefficient
Step 1. Define the Null and Alternative hypothesis: Ho: ρ = 0 Ha: ρ ≠ 0 ρ = population correlation
Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test. Step5: t= (r-ρ) / standard error for correlation
standard error = √{(1-r^2)/(n-2)} where ‘n’ = number of pairs (x,y) of sample data
Step6: Read table value of ‘t’ for significance level, and degree of freedom ( n-2)
11-305Significance testing for correlation coefficient
Step 1. Define the Null and Alternative hypothesis: Ho: ρ = 0 Ha: ρ ≠ 0 ρ = population correlation
Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test. Step5: t= (r-ρ) / standard error for correlation standard error = √{(1-r^2)/(n-2)} where ‘n’ = number of pairs (x,y) of sample dataStep6: Read table value of ‘t’ for significance level, and df ( n-2)
Step 7: Compare ‘t’(cal) with ‘t’ (table value) If t(cal) ≥ t(table value) Reject HoStep 8: Conclude whether the correlation exists in the population
11-306Worked example: test for correlation
Consider a sample data collected on the number ofHours study done by student and the marks obtained by student. Data is as follows: Student hours of study /day marks obtained(%) a 12 63 b 10 68 c 8 53 d 9 60 e 15 75 f 14 80 g 11 68 h 13 53
11-307Worked example: test for correlationConsider a sample data collected on the number ofHours study done by student and the marks obtained by student. Data is as follows:
Student hours of study /day marks obtained(%) a 12 63 b 10 68 c 8 53 d 9 60 e 15 75 f 14 80 g 11 68 h 13 53
Correlation coefficient ‘r’ ( sample) = 0.613 You can refer to the lectures on Sampling methods forGetting the details of how to calculate this value.
11-308Worked example: test for correlation
Step 1: Ho: ρ = 0 Ha: ρ ≠ 0Step 2: Assume alpha level is 5%Step 3: This is a two tail testStep 4: Decide test statistic which is ‘t’ in this caseStep 5: calculate ‘t’ = (0.613-0)/√(1-0.613^2)/6 = 0.613/0.3225 = 1.90Step 6: ‘t’ ( table value) at 5% two tail , df=6 = 2.447Step7: Compare ‘t’ ( cal) with ‘t’ (table value) 1.90 < 2.447 Accept HoStep 8: Conclusion: There is no significant correlation between number of hours of study and marks obtained.
11-309Worked example: test for correlation
Step 1: Ho: ρ = 0 Ha: ρ ≠ 0Step 2: Assume alpha level is 5%Step 3: This is a two tail testStep 4: Decide test statistic which is ‘t’ in this caseStep 5: calculate ‘t’ = (0.613-0)/√(1-0.613^2)/6 = 0.613/0.3225 = 1.90Step 6: ‘t’ ( table value) at 5% two tail , df=6 = 2.447Step7: Compare ‘t’ ( cal) with ‘t’ (table value) 1.90 < 2.447 Accept HoStep 8: Conclusion: There is no significant correlation between number of hours of study and marks obtained. The above example clearly brings out that even though
There is non zero correlation in the sample data but we Cannot conclude in the population that a correlation exists.If the correlation had been a larger value or if the sample Size had been larger and then for the same correlation Coefficient we may have concluded that a correlation existsIn the population. This is important to understand
11-310Testing for the Regression coefficient
Recall that we had developed a regression equation to Estimate the dependent variable (y) if we know the valueOf the independent variable (x) . y= a + bx where y= dependent variable x = independent variable a = intercept b = regression coefficient (also known as slope)Just like correlation, the regression coefficient is based onSample data only and in order to use the same to estimate The regression coefficient in the population we need to testIts significance. Population equation would be given as Y = ßo + ß1(X) where ß1= regression coefficient in the population
11-311Testing for the Regression coefficient
Step1: Null and hypothesis: Ho: ß1 = 0 Ha: ß1 ≠ 0Step2: Decide on the level of alpha Step3: This is a two tail test based on the sign of HaStep4: Decide on the test statistic . Usually ‘t’ because the sample size would be smallStep5: Calculate ‘t’ statistic = (b1-ß1)/standard error(b1)Step6: Read the table value of ‘t’ for alpha level and df df = n-2 Step7: Compare ‘t’ (cal) with ‘t’ (table value) If ‘t’ (cal) ≥ ‘t’ (table value) Reject HoStep8: Conclusion.
11-312Worked example for testing of regression
coefficient
An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30 The equation developed for this
y= a + b(x) where a = -76.57 you can use the formula’s b = 10.44 given in the subject on sampling methods
11-313Understanding the regression equation
We first need to understand what is the regression equationThat we have developed. Getting a regression equation only means that we have minimized the error in estimating the value of ‘y’ but not made it zero. Therefore for the problem stated earlier we can calculate whatWould be the error that would be made if we use the equationDeveloped and what would be the error if we did not knowThe equation. If we did not know the equation we would Have used the mean value of ‘y’ to estimate ‘y’Then the error would have been ∑{y-y(bar)}^2If we know the equation then the error would be ∑{(y(actual) – y(est)}^2 let us calculate both these values for the problem just given
11-314Understanding the regression equation
Total error =∑{y-y(bar)}^2 = 8440.9If we know the equation then the error would be Error if equation is known ∑{(y(actual) – y(est)}^2 = 1640.86This means that an error of Total error – Error still there 8440.9-1640.86 = 6800.04 has been explained because of the regression equation.
11-315Understanding the regression equation
Source of error
Sum of squares of error
df Mean square
F(cal) F( table value)
Due to regression
6800.04 1 6800.04 33.15 11.26
Error still remaining (residual error)
1640.86 8 205.10
Mean square error
Total Error
8440.9 9
Error variance
11-316Understanding the regression equation
Square root of the mean square error which is Gives the standard error of the estimate (Se) Se = √ 205.1 = 14.32.
Standard error for (b1) = Se / √{∑(x-x(bar))^2}∑(x-x(bar))^2} = 62.4√ 62.4 = 7.89Standard error (b1) = 14.32 / 7.89 = 1.814
11-317Worked example for testing of regression
coefficient
An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30Step 1: Ho: =0
Ha: ß1≠0Step2: Assume an alpha level of 5% Step3: This is a two tail test. Step4: Decide on the test statistic :- ‘t’ in this caseStep5: Calculate ‘t’ = (b-ß1) / standard error (b1) (10.44-0) / 1.814 = 5.755Step 6: Read table value ‘t’ , 5% alpha, df= 8 = 2.306
11-318Worked example for testing of regression
coefficient
An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30Step 1: Ho: =0
Ha: ß1≠0Step2: Assume an alpha level of 5% Step3: This is a two tail test. Step4: Decide on the test statistic :- ‘t’ in this caseStep5: Calculate ‘t’ = (b-ß1) / standard error (b1) =(10.44-0) / 1.814 = 5.755Step 6: Read table value ‘t’ , 5% alpha, df= 8 = 2.306 Step7: ‘t’(cal) > ‘t’(table value) Reject HoStep8: The regression coefficient calculated in the sample is significant and can be used to find the interval estimate in for the population parameter.
11-319
References used for this subject
1. Statistics for Business and Economics, 5th edition, Paul Newbold, William Carlson & Betty Thorne Prentice Hall Publication. 2. General Statistics : Warren Chase and Fred Bown John Wiley Publication3. Marketing Research, 5th edition, Naresh Malhotra, Pearson Education Publication4. Statistics for Business and Economics, 8th edition, Anderson, Sweeney & Williams. Thomson South-Western Publication5. Complete Business Statistics, 6th edition, Azcel & Sounderpandian Tata McGraw Hill publication. 6. Applied Statistics in Business & Economics, David P. Doane & Lori E. Seward : Tata McGraw Hill