GM – 03 QUANTATIVE TECHNIQUES FOR MANAGERS. 11-2 Making Decisions Data, Information, Knowledge 1....

GM – 03 QUANTATIVE TECHNIQUES FOR

MANAGERS

Making Decisions

Data, Information, KnowledgeData, Information, Knowledge1. Data: specific observations of measured

numbers.2. Information: processed and summarized

data yielding facts and ideas.3. Knowledge: selected and organized

information that provides understanding, recommendations, and the basis for decisions.

WHAT DOES STATISTICS ACHIEVE

Making Decisions

Descriptive and Inferential Statistics

Descriptive StatisticsDescriptive Statistics include graphical and numerical procedures that

summarize and process data and are used to transform data into

information.

BRANCHES OF STATISTICS

Making Decisions

Descriptive and Inferential Statistics

Inferential Statistics Inferential Statistics provide the bases for predictions, forecasts, and

estimates that are used to transform information to

knowledge.

The Journey to Making Decisions

Begin Here:

Identify the Problem

Information

Knowledge

Decision

Descriptive Statistics,Probability, Computers

Experience, Theory,Literature, InferentialStatistics, Computers

Describing Describing DataData

11-7Summarizing and Describing Data

Tables and GraphsTables and Graphs Numerical MeasuresNumerical Measures

Frequency Distributions

A frequency distributionfrequency distribution is a table used to organize data. The left column (called classes or groups) includes numerical

intervals on a variable being studied. The right column is a list of the frequencies, or number of observations, for each class.

Intervals are normally of equal size, must cover the range of the sample observations,

and be non-overlapping.

Example of a Frequency Distribution

A Frequency Distribution for the Shampoo Example

Weights (in mL) Number of Bottles220 less than 225 1225 less than 230 4230 less than 235 29235 less than 240 34240 less than 245 26245 less than 250 6

Cumulative Frequency Distributions

A cumulative frequency distributioncumulative frequency distribution contains the number of observations

whose values are less than the upper limit of each interval. It is constructed by

adding the frequencies of all frequency distribution intervals up to and including

the present interval.

Relative Cumulative Frequency Distributions

A relative cumulative frequency relative cumulative frequency distribution distribution converts all cumulative

frequencies to cumulative percentages

Example of a Frequency Distribution

A Cumulative Frequency Distribution for the Shampoo Example

Weights (in mL) Number of Bottlesless than 225 1less than 230 5less than 235 34less than 240 68less than 245 94less than 250 100

Parameters and Statistics

A statisticstatistic is a descriptive measure computed from a sample of data. A parameterparameter is a descriptive measure

computed from an entire population of data.

Measures of Central Tendency- Arithmetic Mean -

A arithmetic mean arithmetic mean is of a set of data is the sum of the data values

divided by the number of observations.

Sample Mean

If the data set is from a sample, then the sample mean, , is:X

Population Mean

If the data set is from a population, then the population mean, , is:

Measures of Central Tendency- Median -

An ordered array ordered array is an arrangement of data in either ascending or descending order. Once

the data are arranged in ascending order, the medianmedian is the value such that 50% of the observations are smaller and 50% of the

observations are larger.

If the sample size n is an odd number, the median, Xm, is the middle observation. If the sample size n is an even number, the medianmedian, Xm, is the average of the two middle observations. The medianmedian will be located in the 0.50(n+1)th ordered position0.50(n+1)th ordered position.

Measures of Central Tendency- Mode -

The mode, mode, if one exists, is the most frequently occurring

observation in the sample or population.

Shape of the Distribution

The shape of the distribution is said to be symmetricsymmetric if the observations are balanced, or evenly distributed,

about the mean. In a symmetric distribution the mean and median

are equal.

Shape of the Distribution

A distribution is skewedskewed if the observations are not symmetrically distributed above

and below the mean. A positively skewedpositively skewed (or skewed to the right) distribution has a

tail that extends to the right in the direction of positive values. A negatively negatively

skewedskewed (or skewed to the left) distribution has a tail that extends to the left in the

direction of negative values.

Shapes of the Distribution

Symmetric Distribution

0123456789

1 2 3 4 5 6 7 8 9

Positively Skewed Distribution

1 2 3 4 5 6 7 8 9

Negatively Skewed Distribution

1 2 3 4 5 6 7 8 9

Measures of Variability- The Range -

The range range in a set of data is the difference between the

largest and smallest observations

MEASURES OF DISPERSION

Measures of Variability- Sample Variance -

The sample variance, ssample variance, s22, , is the sum of the squared differences between each

observation and the sample mean divided by the sample size minus 1.

MOST IMPORTANT MEASURE OF DISPERSION

Measures of Variability- Short-cut Formulas for Sample Variance -

Short-cut formulas for the sample variance sample variance are:

Xnxsor

Measures of Variability- Population Variance -

The population variance, population variance, 22, , is the sum of the squared differences between each observation

and the population mean divided by the population size, N.

DISTINGUISH BETWEEN POPULATION VARIANCEAND SAMPLE VARIANCE (IMPORTANT)

Measures of Variability- Sample Standard Deviation -

The sample standard deviation, s, sample standard deviation, s, is the positive square root of the variance, and is

defined as:

Measures of Variability- Population Standard Deviation-

The population standard deviation, population standard deviation, , , is

For a set of data with a bell-shaped histogram, the Empirical RuleEmpirical Rule is:

• approximately 68%68% of the observations are contained with a distance of one standard deviation around the mean; 1

• approximately 95%95% of the observations are contained with a distance of two standard deviations around the mean; 2

• almost all of the observations are contained with a distance of three standard deviation around the mean; 3

Coefficient of Variation

The Coefficient of Variation, CV, Coefficient of Variation, CV, is a measure of relative dispersion that expresses the standard

deviation as a percentage of the mean (provided the mean is positive).

The sample coefficient of variationsample coefficient of variation is

The population coefficient of variationpopulation coefficient of variation is

0100 XifX

Five-Number Summary

The Five-Number Summary Five-Number Summary refers to the refers to the five descriptive measures: minimum, first five descriptive measures: minimum, first

quartile, median, third quartile, and the quartile, median, third quartile, and the maximum.maximum.

imumimum XQMedianQX max31min

Grouped Data Mean

For a population of N observations the mean is

For a sample of n observations, the mean is

Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively

Grouped Data Variance

For a population of N observations the variance is

For a sample of n observations, the variance is

Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively

Pie Charts Categories represented as percentages of total

Bar Graphs Heights of rectangles represent group

frequencies Frequency Polygons

Height of line represents frequency Ogives

Height of line represents cumulative frequency Time Plots

Represents values over time

1-8 Methods of Displaying Data

Pie Chart

Category

Happy with career

Don't like my job but it is on my career pathJob is OK, but it is not on my career path

Enjoy job, but it is not on my career pathMy job just pays the bills

Figure 1-10: Twentysomethings split on job satisfication

My job just pays the bills

Happy with career

Enjoy job, but it is not on my career path

Job OK, but it is not on my career path

Do not like my job, but it is on my career path

Bar Chart

C41Q4Q3Q2Q1Q

Figure 1-11: SHIFTING GEARS

2003 2004

Quartely net income for General Motors (in billions)

C41Q4Q3Q2Q1Q

Figure 1-11: SHIFTING GEARS

2003 2004

Quartely net income for General Motors (in billions)

Relative Frequency Polygon Ogive

Frequency Polygon and Ogive

50403020100

Sales50403020100

(Cumulative frequency or relative frequency graph)

OSAJJMAMFJDNOSAJJMAMFJDNOSAJJMAMFJ

M o nthly S te e l P ro d uc tio n

Time Plot

1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02

Example 1-8: Stem-and-Leaf Display

Figure 1-17: Task Performance Times

X X *o

MedianQ1 Q3InnerFence

InnerFence

OuterFence

Interquartile Range

Smallest data point not below inner fence

Largest data point not exceeding inner fence

Suspected outlierOutlier

Q1-3(IQR)Q1-1.5(IQR) Q3+1.5(IQR)

Q3+3(IQR)

Elements of a Box PlotElements of a Box Plot

Box Plot

ProbabilityProbability

Using StatisticsBasic Definitions: Events, Sample Space,

and ProbabilitiesBasic Rules for ProbabilityConditional Probability Independence of EventsCombinatorial ConceptsThe Law of Total Probability and Bayes’

Theorem Joint Probability TableUsing the Computer

ProbabilityProbability22

2-1 Probability is:

A quantitative measure of uncertainty

A measure of the strength of belief in the occurrence of an uncertain event

A measure of the degree of chance or likelihood of occurrence of an uncertain event

Measured by a number between 0 and 1 (or between 0% and 100%)

Types of Probability

Objective or Classical Probability based on equally-likely events based on long-run relative frequency

of events not based on personal beliefs is the same for all observers

(objective) examples: toss a coin, throw a die,

pick a card

Types of Probability (Continued)

Subjective Probability based on personal beliefs,

experiences, prejudices, intuition - personal judgment

different for all observers (subjective) examples: Super Bowl, elections, new

product introduction, snowfall

Set - a collection of elements or objects of interest Empty set (denoted by )

a set containing no elements Universal set (denoted by S)

a set containing all possible elements Complement (Not). The complement

of A is a set containing all elements of S not in A

2-2 Basic Definitions

Complement of a Set

Venn Diagram illustrating the Complement of an eventVenn Diagram illustrating the Complement of an event

Intersection (And)– a set containing all elements in both A and B

Union (Or)– a set containing all elements in A or B or both

A B A B

Basic Definitions (Continued)

A BA B

Sets: A Intersecting with B

Sets: A Union B

A BA B

• Mutually exclusive or disjoint sets

–sets having no elements in common, having no intersection, whose intersection is the empty set

• Partition

–a collection of mutually exclusive sets which together include all possible elements, whose union is the universal set

Basic Definitions (Continued)

11-51Mutually Exclusive or Disjoint Sets

Sets have nothing in common

• Process that leads to one of several possible outcomes *, e.g.:

Coin toss• Heads, Tails

Throw die• 1, 2, 3, 4, 5, 6

Pick a card AH, KH, QH, ...

Introduce a new product• Each trial of an experiment has a single

observed outcome.• The precise outcome of a random

experiment is unknown before a trial.

* Also called a basic outcome, elementary event, or simple event* Also called a basic outcome, elementary event, or simple event

Experiment

Sample Space or Event Set Set of all possible outcomes (universal set) for a

given experiment E.g.: Roll a regular six-sided die

S = {1,2,3,4,5,6}

Event Collection of outcomes having a common

characteristic E.g.: Even number

A = {2,4,6}

Event A occurs if an outcome in the set A occurs Probability of an event

Sum of the probabilities of the outcomes of which it consists P(A) = P(2) + P(4) + P(6)

Events : Definition

• For example: Throw a die

• Six possible outcomes {1,2,3,4,5,6}• If each is equally-likely, the probability of

each is 1/6 = 0.1667 = 16.67%

• Probability of each equally-likely outcome is

1 divided by the number of possible outcomes

Event A (even number)• P(A) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6

= 1/2• for e in A

P A P e

( ) ( )

P en S

( )( )

Equally-likely Probabilities(Hypothetical or Ideal Experiments)

Pick a Card: Sample Space

Event ‘Ace’Union of Events ‘Heart’and ‘Ace’

Event ‘Heart’

The intersection of theevents ‘Heart’ and ‘Ace’ comprises the single pointcircled twice: the ace of hearts

P Heart Ace

n Heart Ace

P Heartn Heart

( )( )

P Acen Ace

( )( )

P Heart Acen Heart Ace

( )( )

Hearts Diamonds Clubs Spades

A A A AK K K KQ Q Q QJ J J J

10 10 10 109 9 9 98 8 8 87 7 7 76 6 6 65 5 5 54 4 4 43 3 3 32 2 2 2

Range of Values for P(A):

Complements - Probability of not A

Intersection - Probability of both A and B

Mutually exclusive events (A and C) :

Range of Values for P(A):

Complements - Probability of not A

Intersection - Probability of both A and B

Mutually exclusive events (A and C) :

1)(0 AP

P A P A( ) ( ) 1

P A B n A Bn S

( ) ( )( )

P A C( ) 0

2-3 Basic Rules for Probability

• Union - Probability of A or B or both (rule of unions)

Mutually exclusive events: If A and B are mutually exclusive, then

• Union - Probability of A or B or both (rule of unions)

Mutually exclusive events: If A and B are mutually exclusive, then

P A B n A Bn S

P A P B P A B( ) ( )( )

( ) ( ) ( )

)()()( 0)( BPAPBAPsoBAP

Basic Rules for Probability (Continued)

Sets: P(A Union B)

)( BAP )( BAP

• Conditional Probability - Probability of A given B

Independent events:

• Conditional Probability - Probability of A given B

Independent events:

0)( ,)(

)()( BPwhereBP

BAPBAP

P A B P A

P B A P B

( ) ( )

2-4 Conditional Probability

Rules of conditional probability:Rules of conditional probability:

If events A and D are statistically independent:

P A B P A BP B

( ) ( )( )

P A B P A B P B

P B A P A

( ) ( ) ( )

( ) ( )

P AD P A

P D A P D

( ) ( )

P A D P A P D( ) ( ) ( )

Conditional Probability (continued)

AT& T IBM Total

Telecommunication 40 10 50

Computers 20 30 50

Total 60 40 100

Counts

AT& T IBM Total

Telecommunication .40 .10 .50

Computers .20 .30 .50

Total .60 .40 1.00

Probabilities

2.050.0

TIBMPTIBMP

Probability that a project is undertaken by IBM given it is a telecommunications project:

Contingency Table - Example 2-2

P A B P A

P B A P B

P A B P A P B

( ) ( )

( ) ( ) ( )

Conditions for the statistical independence of events A and B:

P Ace HeartP Ace Heart

P Heart

( )( )

1521352

P Heart AceP Heart Ace

P Heart

( )( )

)()(52

4)( HeartPAcePHeartAceP

2-5 Independence of Events

0976.00024.006.004.0

)()()()()

0024.006.0*04.0

)()()()

BTPBPTPBTPb

BPTPBTPa

0976.00024.006.004.0

)()()()()

0024.006.0*04.0

)()()()

BTPBPTPBTPb

BPTPBTPa

Events Television (T) and Billboard (B) are assumed to be independent.

Independence of Events – Example 2-5

The probability of the union of several independent events is 1 minus the product of probabilities of their complements:

P A A A An P A P A P A P An( ) ( ) ( ) ( ) ( )1 2 3

11 2 3

Example 2-7:

6513.03487.011090.01

)10()3()2()1(1)10321(

QPQPQPQPQQQQ

The probability of the intersection of several independent events is the product of their separate individual probabilities:

P A A A An P A P A P A P An( ) ( ) ( ) ( ) ( )1 2 3 1 2 3

Product Rules for Independent Events

Consider a pair of six-sided dice. There are six possible outcomes from throwing the first die {1,2,3,4,5,6} and six possible outcomes from throwing the second die {1,2,3,4,5,6}. Altogether, there are 6*6 = 36 possible outcomes from throwing the two dice.

In general, if there are n events and the event i can happen in Ni possible ways, then the number of ways in which the

sequence of n events may occur is N1N2...Nn.

Pick 5 cards from a deck of 52 - with replacement 52*52*52*52*52=525 380,204,032

different possible outcomes

Pick 5 cards from a deck of 52 - without replacement 52*51*50*49*48 = 311,875,200

different possible outcomes

2-6 Combinatorial Concepts

How many ways can you order the 3 letters A, B, and C?

There are 3 choices for the first letter, 2 for the second, and 1 for the last, so there are 3*2*1 = 6 possible ways to order the threeletters A, B, and C.

How many ways are there to order the 6 letters A, B, C, D, E, and F? (6*5*4*3*2*1 = 720)

Factorial: For any positive integer n, we define n factorial as:n(n-1)(n-2)...(1). We denote n factorial as n!. The number n! is the number of ways in which n objects can be ordered. By definition 1! = 1 and 0! = 1.

Factorial

Permutations are the possible ordered selections of r objects out of a total of n objects. The number of permutations of n objectstaken r at a time is denoted by nPr, where

What if we chose only 3 out of the 6 letters A, B, C, D, E, and F?There are 6 ways to choose the first letter, 5 ways to choose the second letter, and 4 ways to choose the third letter (leaving 3letters unchosen). That makes 6*5*4=120 possible orderings orpermutations.

1204*5*61*2*3

1*2*3*4*5*6

exampleFor

rnnrPn )!(!

Permutations (Order is important)

Combinations are the possible selections of r items from a group of n itemsregardless of the order of selection. The number of combinations is denotedand is read as n choose r. An alternative notation is nCr. We define the numberof combinations of r out of n elements as:

Suppose that when we pick 3 letters out of the 6 letters A, B, C, D, E, and F we chose BCD, or BDC, or CBD, or CDB, or DBC, or DCB. (These are the6 (3!) permutations or orderings of the 3 letters B, C, and D.) But these are orderings of the same combination of 3 letters. How many combinations of 6different letters, taking 3 at a time, are there?

1 * 2 * 3

4 * 5 * 6

1) * 2 * 1)(3 * 2 * (3

1 * 2 * 3 * 4 * 5 * 6

)!36(!3

exampleFor

r)!(nr!

Combinations (Order is not Important)

P A P A B P A B( ) ( ) ( )

In terms of conditional probabilities:

More generally (where Bi make up a partition):

P A P A B P A BP A B P B P A B P B

( ) ( ) ( )( ) ( ) ( ) ( )

P A P A Bi

( ) ( )

2-7 The Law of Total Probability and Bayes’ Theorem

The law of total probability:

• Bayes’ theorem enables you, knowing just a little more than the probability of A given B, to find the probability of B given A.

• Based on the definition of conditional probability and the law of total probability.

P B AP A B

P A B P A B

P AB P B

P AB P B P AB P B

( )( )

( ) ( )

( ) ( ) ( ) ( )

Applying the law of total probability to the denominator

Applying the definition of conditional probability throughout

Bayes’ Theorem

2-8 The Joint Probability Table

A joint probability table is similar to a contingency table , except that it has probabilities in place of frequencies.

The joint probability for Example 2-11 is shown below.

The row totals and column totals are called marginal probabilities.

The Joint Probability Table

11-74The Joint Probability Table:

The joint probability table is summarized below.

High Medium Low TotalTotal Appreciates ( Re)

0.21 0.2 0.04 0.45

Depreciates

0.09 0.3 0.16 0.55

TotalTotal 0.30 0.5 0.20 1.00Marginal probabilities are the row totals and the column totals.Marginal probabilities are the row totals and the column totals.

GROWTH

Random VariablesRandom Variables

Consider the different possible orderings of boy (B) and girl (G) in four sequential births. There are 2*2*2*2=24 = 16 possibilities, so the sample space is:

BBBB BGBB GBBB GGBB BBBG BGBG GBBG GGBGBBGB BGGB GBGB GGGBBBGG BGGG GBGG GGGG

If girl and boy are each equally likely [P(G) = P(B) = 1/2], and the gender of each child is independent of that of the previous child, then the probability of each of these 16 possibilities is:(1/2)(1/2)(1/2)(1/2) = 1/16.

3-1 Using Statistics

Random Variables (Continued)

BBBB BGBB GBBB

BBBG BBGB

GGBB GBBG BGBG

BGGB GBGB BBGG BGGG GBGG

GGGB GGBG

Sample Space

Points on the Real Line

Since the random variable X = 3 when any of the four outcomes BGGG, GBGG, GGBG, or GGGB occurs,

P(X = 3) = P(BGGG) + P(GBGG) + P(GGBG) + P(GGGB) = 4/16

The probability distribution of a random variable is a table that lists the possible values of the random variables and their associated probabilities.

x P(x)0 1/161 4/162 6/163 4/164 1/16 16/16=1

The Graphical Display for this Probability Distributionis shown on the next Slide.

Number of Girls, X

bability

Probability Distribution of the Number of Girls in Four Births

Number of Girls, X

bability

Probability Distribution of the Number of Girls in Four Births

Consider the experiment of tossing two six-sided dice. There are 36 possible outcomes. Let the random variable X represent the sum of the numbers on the two dice:

2 3 4 5 6 71,1 1,2 1,3 1,4 1,5 1,6 82,1 2,2 2,3 2,4 2,5 2,6 93,1 3,2 3,3 3,4 3,5 3,6 104,1 4,2 4,3 4,4 4,5 4,6 11

5,1 5,2 5,3 5,4 5,5 5,6 126,1 6,2 6,3 6,4 6,5 6,6

x P(x)*

2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36

x P(x)*

2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36

12111098765432

Probability Distribution of Sum of Two Dice

* ( ) ( ( ) ) / Note that: P x x 6 7 362

Example 3-1

Probability of at least 1 switch: P(X 1) = 1 - P(0) = 1 - 0.1 = .9Probability of at least 1 switch: P(X 1) = 1 - P(0) = 1 - 0.1 = .9

Probability Distribution of the Number of Switches

x P(x)0 0.11 0.22 0.33 0.24 0.15 0.1

Probability of more than 2 switches: P(X > 2) = P(3) + P(4) + P(5) = 0.2 + 0.1 + 0.1 = 0.4Probability of more than 2 switches: P(X > 2) = P(3) + P(4) + P(5) = 0.2 + 0.1 + 0.1 = 0.4

543210

The Probability Distribution of the Number of Switches

Example 3-2

A discrete random variable: has a countable number of possible values has discrete jumps (or gaps) between successive values has measurable probability associated with individual values counts

A continuous random variable: has an uncountably infinite number of possible values moves continuously from value to value has no measurable probability associated with each value measures (e.g.: height, weight, speed, value, duration, length)

Discrete and Continuous Random Variables

. for all values of x.

Corollary:

The probability distribution of a discrete random variable X must satisfy the following two conditions.

Rules of Discrete Probability Distributions

F x P X x P iall i x

( ) ( ) ( )

The cumulative distribution function, F(x), of a discrete random variable X is:

x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0

1.00 543210

Cumulative Probability Distribution of the Number of Switches

Cumulative Distribution Function

x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0

The probability that at most three switches will occur:

Cumulative Distribution Function

Note:Note: P(X < 3) = F(3) = 0.8 = P(0) + P(1) + P(2) + P(3)

x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0

The probability that more than one switch will occur:

Using Cumulative Probability Distributions (Figure 3-8)

Note:Note: P(X > 1) = P(X > 2) = 1 – P(X < 1) = 1 – F(1) = 1 – 0.3 = 0.7

x P(x) F(x)0 0.1 0.11 0.2 0.32 0.3 0.63 0.2 0.84 0.1 0.95 0.1 1.0

The probability that anywhere from one to three switches will occur:

Using Cumulative Probability Distributions (Figure 3-9)

Note:Note: P(1 < X < 3) = P(X < 3) – P(X < 0) = F(3) – F(0) = 0.8 – 0.1 = 0.7

The mean of a probability distribution is a measure of its centrality or location, as is the mean or average of a frequency distribution. It is a weighted average, with the values of the random variable weighted by their probabilities.

The mean is also known as the expected value (or expectation) of a random variable, because it is the value that is expected to occur, on average.

The expected value of a discrete random variable X is equal to the sum of each value of the random variable multiplied by its probability.

E X xP xall x

( ) ( )

x P(x) xP(x)0 0.1 0.01 0.2 0.22 0.3 0.63 0.2 0.64 0.1 0.45 0.1 0.5 1.0 2.3 = E(X) =

543210

3-2 Expected Values of Discrete Random Variables

Number of items, x P(x) xP(x) h(x) h(x)P(x) 5000 0.2 1000 2000 400 6000 0.3 1800 4000 1200 7000 0.2 1400 6000 1200 8000 0.2 1600 8000 1600 9000 0.1 900 10000 1000

1.0 6700 5400

Example 3-3Example 3-3: Monthly sales of a certain product are believed to follow the given probability distribution. Suppose the company has a fixed monthly production cost of $8000 and that each item brings $2. Find the expected monthly profit h(X), from product sales.

E h X h x P xall x

[ ( )] ( ) ( ) 5400

The expected value of a function of a discrete random variable X is:

E h X h x P xall x

[ ( )] ( ) ( )

The expected value of a linear function of a random variable is: E(aX+b)=aE(X)+b

In this case: E(2X-8000)=2E(X)-8000=(2)(6700)-8000=5400In this case: E(2X-8000)=2E(X)-8000=(2)(6700)-8000=5400

Expected Value of a Function of a Discrete Random Variables

Note: h (X) = 2X – 8000 where X = # of items sold

The variancevariance of a random variable is the expected squared deviation from the mean:

V X E X x P x

E X E X x P x xP x

all x all x

( ) [( ) ] ( ) ( )

( ) [ ( )] ( ) ( )

The standard deviationstandard deviation of a random variable is the square root of its variance: SD X V X( ) ( )

Variance and Standard Deviation of a Random Variable

Number ofSwitches, x P(x) xP(x) (x-) (x-)2 P(x-)2 x2P(x)

0 0.1 0.0 -2.3 5.29 0.529 0.01 0.2 0.2 -1.3 1.69 0.338 0.22 0.3 0.6 -0.3 0.09 0.027 1.23 0.2 0.6 0.7 0.49 0.098 1.84 0.1 0.4 1.7 2.89 0.289 1.65 0.1 0.5 2.7 7.29 0.729 2.5

2.3 2.010 7.3

Number ofSwitches, x P(x) xP(x) (x-) (x-)2 P(x-)2 x2P(x)

0 0.1 0.0 -2.3 5.29 0.529 0.01 0.2 0.2 -1.3 1.69 0.338 0.22 0.3 0.6 -0.3 0.09 0.027 1.23 0.2 0.6 0.7 0.49 0.098 1.84 0.1 0.4 1.7 2.89 0.289 1.65 0.1 0.5 2.7 7.29 0.729 2.5

2.3 2.010 7.3

73 232 201

V X E X

xall x

E X E X

xall x

P x xP xall x

( ) [( ) ]

( ) ( ) .

( ) [ ( )]

( ) ( )

Table 3-8

Variance and Standard Deviation of a Random Variable – using Example 3-2

Recall: = 2.3.

The variance of a linear function of a random variable is:

V a X b a V X a( ) ( ) 2 2 2

Number of items, x P(x) xP(x) x2 P(x) 5000 0.2 1000 5000000 6000 0.3 1800 10800000 7000 0.2 1400 9800000 8000 0.2 1600 12800000 9000 0.1 900 8100000

1.0 6700 46500000

Example 3-Example 3-3:3:

2 8000

46500000 6700 1610000

1610000 1268 862 8000 2

4 1610000 6440000

2 80002 2 1268 86 2537 72

E X E X

x P x xP x

SD XV X V X

all x all x

( ) [ ( )]

( ) ( )

( ) .( ) ( ) ( )

( )( )

( )( )( . ) .

Variance of a Linear Function of a Random Variable

The mean or expected value of the sum of random variables is the sum of their means or expected values:

( ) ( ) ( ) ( )X Y X YE X Y E X E Y

For example: E(X) = $350 and E(Y) = $200

E(X+Y) = $350 + $200 = $550

The variance of the sum of mutually independent random variables is the sum of their variances:

2 2 2( ) ( ) ( ) ( )X Y X YV X Y V X V Y

if and only if X and Y are independent.

For example: V(X) = 84 and V(Y) = 60 V(X+Y) = 144

Some Properties of Means and Variances of Random Variables

The variance of the sum of k mutually independent random variables is the sum of their variances:

Some Properties of Means and Variances of Random Variables

NOTE:NOTE: )(...)2()1()...21( kXEXEXEkXXXE )(...)2()1()...21( kXEXEXEkXXXE

)(...)2(2)1(1)...2211( kXEkaXEaXEakXkaXaXaE )(...)2(2)1(1)...2211( kXEkaXEaXEakXkaXaXaE

)(...)2()1()...21( kXVXVXVkXXXV

)(2...)2(22

)...2211( kXVk

aXVaXVakXkaXaXaV

andand

Chebyshev’s Theorem applies to probability distributions just as it applies to frequency distributions.

For a random variable X with mean standard deviation , and for any number k > 1:

P X kk

( ) 11

At least

Lie within

Standarddeviationsof the mean

Chebyshev’s Theorem Applied to Probability Distributions

• If an experiment consists of a single trial and the outcome of the trial can only be either a success* or a failure, then the trial is called a Bernoulli trial.

• The number of success X in one Bernoulli trial, which can be 1 or 0, is a Bernoulli random variable.

• Note: If p is the probability of success in a Bernoulli experiment, the E(X) = p and V(X) = p(1 – p).

* The terms success and failure are simply statistical terms, and do not have positive or negative implications. In a production setting, finding a defective product may be termed a “success,” although it is not a positive result.

3-3 Bernoulli Random Variable

Consider a Bernoulli Process in which we have a sequence of n identical trials satisfying the following conditions:

1. Each trial has two possible outcomes, called success *and failure. The two outcomes are mutually exclusive and exhaustive.

2. The probability of success, denoted by p, remains constant from trial to trial. The probability of failure is denoted by q, where q = 1-p.

3. The n trials are independent. That is, the outcome of any trial does not affect the outcomes of the other trials.

A random variable, X, that counts the number of successes in n Bernoulli trials, where p is the probability of success* in any given trial, is said to follow the binomial probability distribution with parameters n (number of trials) and p (probability of success). We call X the binomial random variable.

* The terms success and failure are simply statistical terms, and do not have positive or negative implications. In a production setting, finding a defective product may be termed a “success,” although it is not a positive result.

3-4 The Binomial Random Variable

Suppose we toss a single fair and balanced coin five times in succession, and let X represent the number of heads.

There are 25 = 32 possible sequences of H and T (S and F) in the sample space for this experiment. Of these, there are 10 in which there are exactly 2 heads (X=2):

HHTTT HTHTH HTTHT HTTTH THHTT THTHT THTTH TTHHT TTHTH TTTHH

The probability of each of these 10 outcomes is p3q3 = (1/2)3(1/2)2=(1/32), so the probability of 2 heads in 5 tosses of a fair and balanced coin is:

P(X = 2) = 10 * (1/32) = (10/32) = 0.3125

10 (1/32)

Number of outcomeswith 2 heads

Probability of eachoutcome with 2 heads

Binomial Probabilities (Introduction)

10 (1/32)

Number of outcomeswith 2 heads

Probability of eachoutcome with 2 heads

P(X=2) = 10 * (1/32) = (10/32) = .3125Notice that this probability has two parts:

In general:

1. The probability of a given sequence of x successes out of n trials with probability of success p and probability of failure q is equal to:

pxq(n-x) nCxn

nx n x

2. The number of different sequences of n trials that result in exactly x successes is equal to the number of choices of x elements out of a total of n elements. This number is denoted:

Binomial Probabilities (continued)

11-100

Number of successes, x Probability P(x)

nn n n

!!( )!

The binomial probability distribution:

where :p is the probability of success in a single trial,q = 1-p,n is the number of trials, andx is the number of successes.

nx n x

p qx n x x n x( )!

!( )!( ) ( )

The Binomial Probability Distribution

The Normal DistributionThe Normal Distribution

11-102

As n increases, the binomial distribution approaches a ...

n = 6 n = 14n = 10

Normal Probability Density Function:

6543210

Binomial Distribution: n=6, p=.5

109876543210

14131211109876543210

Normal Distribution: = 0, = 1

4-1 Introduction

...14159265.3 and ...7182818.2 where

for 22

11-103

The normal probability density function:

The Normal Probability Distribution

. ... . ...

2 7182818 314159265

where and

11-104

• The normal is a family of Bell-shaped and symmetric distributions.

because the distribution is symmetric, one-half (.50 or 50%) lies on either side of the mean.

Each is characterized by a different pair of mean, , and variance, . That is: [X~N()].

Each is asymptotic to the horizontal axis. The area under any normal probability

density function within k of is the same for any normal distribution, regardless of the mean and variance.

4-2 Properties of the Normal Distribution

11-105

• If several independent random variables are normally distributed then their sum will also be normally distributed.

• The mean of the sum will be the sum of all the individual means.

• The variance of the sum will be the sum of all the individual variances (by virtue of the independence).

4-2 Properties of the Normal Distribution (continued)

11-106

• If X1, X2, …, Xn are independent normal random variable, then their sum S will also be normally distributed with

• E(S) = E(X1) + E(X2) + … + E(Xn)

• V(S) = V(X1) + V(X2) + … + V(Xn)• Note: It is the variances that can be

added above and not the standard deviations.

11-107

Example 4.1: Let X1, X2, and X3 be independent random variables that are normally distributed with means and variances as shown.

4-2 Properties of the Normal Distribution – Example 4-1

Mean Variance

X1 10 1

X2 20 2

X3 30 3

Let S = X1 + X2 + X3. Then E(S) = 10 + 20 + 30 = 60 and V(S) = 1 + 2 + 3 = 6. The standard deviation of S is = 2.45.

11-108

• If X1, X2, …, Xn are independent normal random variable, then the random variable Q defined as Q = a1X1 + a2X2 + … + anXn + b will also be normally distributed with

• E(Q) = a1E(X1) + a2E(X2) + … + anE(Xn) + b• V(Q) = a1

2 V(X1) + a22 V(X2) + … + an

2 V(Xn)• Note: It is the variances that can be

added above and not the standard deviations.

11-109

Example 4.3: Let X1 , X2 , X3 and X4 be independent random variables that are normally distributed with means and variances as shown. Find the mean and variance of Q = X1 - 2X2 + 3X2 - 4X4 + 5

4-2 Properties of the Normal Distribution – Example 4-3

Mean Variance

X1 12 4

X2 -5 2

X3 8 5

X4 10 1

E(Q) = 12 – 2(-5) + 3(8) – 4(10) + 5 = 11

V(Q) = 4 + (-2)2(2) + 32(5) + (-4)2(1) = 73

SD(Q) = 544.873

11-110

Computing the Mean, Variance and Standard Deviation for the Sum of Independent Random Variables Using the Template

11-111

All of these are normal probability density functions, though each has a different mean and variance.

Z~N(0,1)

Normal Distribution: =0, =1

W~N(40,1) X~N(30,25)

454035

6050403020100

Y~N(50,9)

65554535

Consider:

P(39 W 41)P(25 X 35)P(47 Y 53)P(-1 Z 1)

The probability in each case is an area under a normal probability density function.

Normal Probability Distributions

11-112

The standard normal random variable, Z, is the normal random variable with mean = 0 and standard deviation = 1: Z~N(0,12).

543210- 1- 2- 3- 4- 5

Standard Normal Distribution

4-4 The Standard Normal Distribution

11-113

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .090.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.41771.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.43191.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.44411.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.45451.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.46331.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.47061.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.47672.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.48172.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.48572.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.48902.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.49162.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.49642.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.49742.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.49812.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.49863.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

543210-1-2-3-4-5

Standard Normal Probabilities

Look in row labeled 1.5 and column labeled .06 to find P(0 z 1.56) = 0.4406

Finding Probabilities of the Standard Normal Distribution: P(0 < Z < 1.56)

11-114

To find P(Z<-2.47):Find table area for 2.47

P(0 < Z < 2.47) = .4932

P(Z < -2.47) = .5 - P(0 < Z < 2.47) = .5 - .4932 = 0.0068

543210-1-2-3-4-5

Table area for 2.47P(0 < Z < 2.47) = 0.4932

Area to the left of -2.47P(Z < -2.47) = .5 - 0.4932= 0.0068

Finding Probabilities of the Standard Normal Distribution: P(Z < -2.47)

z ... .06 .07 .08. . . .. . . .. . . .2.3 ... 0.4909 0.4911 0.49132.4 ... 0.4931 0.4932 0.49342.5 ... 0.4948 0.4949 0.4951...

11-115

Finding Probabilities of the Standard Normal Distribution: P(1< Z < 2)

z .00 ... . . . . . .0.9 0.3159 ...1.0 0.3413 ...1.1 0.3643 ... . . . . . .1.9 0.4713 ...2.0 0.4772 ...2.1 0.4821 ... . . . . . .

To find P(1 Z 2):1. Find table area for 2.00

F(2) = P(Z 2.00) = .5 + .4772 =.9772

2. Find table area for 1.00

F(1) = P(Z 1.00) = .5 + .3413 = .8413

3. P(1 Z 2.00) = P(Z 2.00) - P(Z 1.00)

= .9772 - .8413 = 0.1359

543210-1-2-3-4-5

Area between 1 and 2P(1 Z 2) = .9772 - .8413 = 0.1359

11-116

Finding Values of the Standard Normal Random Variable: P(0 < Z < z) = 0.40

To find z such that P(0 Z z) = .40:

1. Find a probability as close as possible to .40 in the table of standard normal probabilities.

2. Then determine the value of z from the corresponding row and column.

P(0 Z 1.28) .40

Also, since P(Z 0) = .50

P(Z 1.28) .90543210-1-2-3-4-5

Area = .40 (.3997)

Z = 1.28

Area to the left of 0 = .50P(z 0) = .50

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .090.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-117

z .04 .05 .06 .07 .08 .09. . . . . . . . . . . . . .. . . . . . .2.4 ... 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 ... 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 ... 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964. . . . . . .. . . . . . .. . . . . . .

To have .99 in the center of the distribution, there should be (1/2)(1-.99) = (1/2)(.01) = .005 in each tail of the distribution, and (1/2)(.99) = .495 in each half of the .99 interval. That is:

P(0 Z z.005) = .495

Look to the table of standard normal probabilities to find that:

z.005 z.005

P(-.2575 Z ) = .99

To have .99 in the center of the distribution, there should be (1/2)(1-.99) = (1/2)(.01) = .005 in each tail of the distribution, and (1/2)(.99) = .495 in each half of the .99 interval. That is:

P(0 Z z.005) = .495

Look to the table of standard normal probabilities to find that:

z.005 z.005

P(-.2575 Z ) = .99

543210-1-2-3-4-5

-z.005 z.005

Area in right tail = .005Area in left tail = .005

Area in center right = .495

Area in center left = .495

2.575-2.575

Total area in center = .99

99% Interval around the Mean

11-118

The area within k of the mean is the same for all normal random variables. So an area under any normal distribution is equivalent to an area under the standard normal. In this example: P(40 X P(-1 Z sinceand

1009080706050403020100

543210-1-2-3-4-5

Transformation

(2) Division by x)

The transformation of X to Z:The transformation of X to Z:

The inverse transformation of Z to X:The inverse transformation of Z to X:

4-5 The Transformation of Normal Random Variables

(1) Subtraction: (X - x)

X x Z x

11-119

Example 4-9Example 4-9

X~N(160,302)

X~N(127,222)

Using the Normal Transformation

.. . .

100 180100 180

100 160

180 160

302 6667

0 4772 0 2475 0 7247

.. . .

150150

150 127

1 0450 5 0 3520 0 8520

11-120

X~N(383,122)

440390340

543210-1-2-3-4-5

Equivalent areas

Using the Normal Transformation - Example 4-11

Template solutionTemplate solution

394 399

394 383

399 383

0 9166 1 333

0 4088 0 3203 0 0885

11-121

The transformation of X to Z:The transformation of X to Z: The inverse transformation of Z to X:The inverse transformation of Z to X:

The transformation of X to Z, where a and b are numbers::The transformation of X to Z, where a and b are numbers::

The Transformation of Normal Random Variables

P X a P Za

P X b P Zb

P a X b Pa

11-122

543210-1-2-3-4-5

S tand ard N o rm al D is trib utio n• The probability that a normal random variable will be within 1 standard deviation from its mean (on either side) is 0.6826, or approximately 0.68.

• The probability that a normal random variable will be within 2 standard deviations from its mean is 0.9544, or approximately 0.95.

• The probability that a normal random variable will be within 3 standard deviation from its mean is 0.9974.

Normal Probabilities (Empirical Rule)

11-123

z .07 .08 .09 . . . . . . . . . . . . . . .1.1 . . . 0.3790 0.3810 0.38301.2 . . . 0.3980 0.3997 0.40151.3 . . . 0.4147 0.4162 0.4177 . . . . . . . . . . . . . . .

The area within k of the mean is the same for all normal random variables. To find a probability associated with any interval of values for any normal random variable, all that is needed is to express the interval in terms of numbers of standard deviations from the mean. That is the purpose of the standard normal transformation. If X~N(50,102),

That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean of X: 70 = + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal distribution.

The area within k of the mean is the same for all normal random variables. To find a probability associated with any interval of values for any normal random variable, all that is needed is to express the interval in terms of numbers of standard deviations from the mean. That is the purpose of the standard normal transformation. If X~N(50,102),

That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean of X: 70 = + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal distribution.

Example 4-12 X~N(124,122)P(X > x) = 0.10 and P(Z > 1.28) 0.10x = + z = 124 + (1.28)(12) = 139.36

18013080

4-6 The Inverse Transformation

139.36

P X Px

P Z P Z( ) ( )

70 70 50

11-124

4000300020001000

0.0012

0.0010

0.0008

0.0006

0.0004

0.0002

0.0000

543210-1-2-3-4-5

Standard Norm al D istribution

1. Draw pictures of the normal distribution in question and of the standard normal distribution.

Finding Values of a Normal Random Variable, Given a Probability

11-125

2. Shade the area corresponding to the desired probability.

4000300020001000

0.0012

0.0010

0.0008

0.0006

0.0004

0.0002

0.0000

.4750.4750

543210-1-2-3-4-5

.4750.4750

11-126

z .05 .06 .07 . . . . . . . . . . . . . . .1.8 . . . 0.4678 0.4686 0.46931.9 . . . 0.4744 0.4750 0.47562.0 . . . 0.4798 0.4803 0.4808 . . . . . . . . . .

3. From the table of the standard normal distribution, find the z value or values.

4000300020001000

0.0012

0.0010

0.0008

0.0006

0.0004

0.0002

0.0000

.4750.4750

543210-1-2-3-4-5

.4750.4750

-1.96 1.96

11-127

4. Use the transformation from z to x to get value(s) of the original random variable.

x = z = 2450 ± (1.96)(400) = 2450 ±784=(1666,3234)

z .05 .06 .07 . . . . . . . . . . . . . . .1.8 . . . 0.4678 0.4686 0.46931.9 . . . 0.4744 0.4750 0.47562.0 . . . 0.4798 0.4803 0.4808 . . . . . . . . . .

3. From the table of the standard normal distribution, find the z value or values.

4000300020001000

0.0012

0.0010

0.0008

0.0006

0.0004

0.0002

0.0000

.4750.4750

543210-1-2-3-4-5

.4750.4750

-1.96 1.96

11-128

Normal Distribution: = 3.5, = 1.323

76543210

Binomial Distribution: n = 7, p = 0.50

The normal distribution with = 3.5 and = 1.323 is a close approximation to the binomial with n = 7 and p = 0.50.

P(x<4.5) = 0.7749

MTB > cdf 4.5;SUBC> normal 3.5 1.323.Cumulative Distribution Function

Normal with mean = 3.50000 and standard deviation = 1.32300

x P( X <= x) 4.5000 0.7751

MTB > cdf 4.5;SUBC> normal 3.5 1.323.Cumulative Distribution Function

Normal with mean = 3.50000 and standard deviation = 1.32300

x P( X <= x) 4.5000 0.7751

MTB > cdf 4;SUBC> binomial 7,.5.Cumulative Distribution Function

Binomial with n = 7 and p = 0.500000

x P( X <= x) 4.00 0.7734

MTB > cdf 4;SUBC> binomial 7,.5.Cumulative Distribution Function

Binomial with n = 7 and p = 0.500000

x P( X <= x) 4.00 0.7734

P( x 4) = 0.7734

11-129

FOR ANY RESEARCH WE ARE ALWAYS INTERESTEDTO UNDERSTAND THE POPULATION PARAMETERSO THAT DECISIONS CAN BE MADE BASED ONINFORMATION.

EX: A MARKETER MAY BE INTERESTED TO KNOW AVERAGE CONSUMPTION OF SUGAR PER HOUSEHOLD PER MONTH IN THE CITY OF DELHI. THIS INFORMATION IS THE POPULATION PARAMETERWHERE THE WHOLE OF CITY DELHI HOUSEHOLD ISTHE POPULATION AND THE AVERAGE CONSUMPTION

OF SUGAR IS THE PARAMETER REPRESENTED BY ‘µ’

11-130

HOWEVER, FINDING THIS PARAMETER IS DIFFICULTAS IT WILL BE VIRTUALLY IMPRACTICAL TO CONTACTALL THE HOUSEHOLD OF DELHI ( OR TIME TAKENWOULD BE VERY LARGE) AND THE PURPOSE OF THE STUDY ITSELF MAY BE TIME BARED.

HENCE WE MUST RESORT TO COLLECTING THE INFORMATION FROM ONLY A SUBSET OF THE POPULATION WHICH IS CALLED THE SAMPLE. THIS SAMPLE INFORMATION FOR THE SAME VARIABLE IS REFERRED TO AS THE STATISTIC ( x with a bar on the top )

11-131

HOWEVER SAMPLE MEAN IS NOT EQUAL TOPOPULATION TO MEAN AND THE DIFFERENCE IN THE SAME IS THE ERROR IN ESTIMATING THE PARAMETER ( KNOWN AS TOTAL ERROR)

THIS ERROR OCCURS FOR SEVERAL REASONS.

11-132

Sample vs. Census

Conditions Favoring the Use of

Type of Study

Sample Census

1. Budget

2. Time available

Short Long

3. Population size

Large Small

4. Variance in the characteristic

Small Large

5. Cost of sampling errors

Low High

6. Cost of nonsampling errors

High Low

11-133

THUS IT IS CLEAR THAT SAMPLING IS REQUIREDAND IF SAMPLE SIZE IS PROPERLY CHOSEN THEN THE ERROR IS ALSO CAN BE KEPT AT A MINIMUMLEVEL.

11-134

SAMPLING DISTRIBUTION

IF THE TARGET SEGMENT ( POPULATION ) CONTAINS ‘N’ ELEMENTS AND FROM THIS POPULATION WE PICK RANDOMLY ‘n’ ELEMENTS.

IN HOW MANY POSSIBLE WAYS CAN WE PICK UP THESE ‘n’ ELEMENTS?

N^n ways if done with replacementNCn ways if done without replacement

FOR EACH OF THESE SAMPLES THERE WILL BE A SAMPLE MEAN. THE WAY THESE SAMPLE MEANS ARESPREAD IS KNOWN AS SAMPLING DISTRIBUTION.

11-135

Let us illustrate the concept of Sampling Distribution:

Consider a population consisting of only three members( A, B and C). If a question is asked to them as to how Many chocolates do they eat in a day, the answer is A= 1 per day, B = 2 per day and C = 3 per day. HenceThe variable is number of chocolates which is { 1, 2, 3 } . This gives the population average (µ = 2) And a variance ( σ^2) = 2/3.

If sampling of size is 2 is taken with replacement letUs list all possible samples along with its sample means

11-136

Possible sample are sample mean ( 1,1) 1 ( 1,2) 1.5 ( 1,3) 2 ( 2,1) 1.5 ( 2,2) 2 ( 2,3) 2.5 ( 3,1) 2 ( 3,2) 2.5 ( 3,3) 3

Possible freq probSampleMean

1 1 1/9 1.5 2 2/9 2 3 3/9 2.5 2 2/9 3 1 1/9

Expected value of sample mean = 2 = population meanExpected variance of sample means = σ^2/n

11-137

Sample mean1 1.5 2 2.5 3

PROBABILITY

Does this appear to be normally Distributed? Yes indeed!

11-138

Thus the Central Limit Theorem says that The distribution of the sample mean is always

Normally distributed as long as sample size is large

Such that : Expected value of sample Mean = population meanand standard deviation of sample mean = population standard deviation/n

This is true irrespective of the distribution of the Population.

11-139

• Comparing the population distribution and the sampling distribution of the mean: The sampling

distribution is more bell-shaped and symmetric.

Both have the same center.

The sampling distribution of the mean is more compact, with a smaller variance.

• Comparing the population distribution and the sampling distribution of the mean: The sampling

distribution is more bell-shaped and symmetric.

Both have the same center.

The sampling distribution of the mean is more compact, with a smaller variance.

87654321

Uniform Distribution (1,8)

X8.07.57.06.56.05.55.04.54.03.53.02.52.01.51.0

Sampling Distribution of the Mean

Properties of the Sampling Distribution of the Sample Mean

11-140

The expected value of the sample meanexpected value of the sample mean is equal to the population mean:

E XX X

The variance of the sample meanvariance of the sample mean is equal to the population variance divided by the sample size:

X( ) 2

The standard deviation of the sample mean, known as the standard error of standard deviation of the sample mean, known as the standard error of the meanthe mean, is equal to the population standard deviation divided by the square root of the sample size:

SD XnX

Relationships between Population Parameters and the Sampling Distribution of the Sample Mean

11-141

When sampling from a normal populationnormal population with mean and standard deviation , the sample mean, X, has a normal sampling distributionnormal sampling distribution:

~ ( , ) 2

This means that, as the sample size increases, the sampling distribution of the sample mean remains centered on the population mean, but becomes more compactly distributed around that population mean

Normal population

Sampling Distribution of the Sample Mean

Sampling Distribution: n = 2

Sampling Distribution: n =16

Sampling Distribution: n = 4

Sampling from a Normal Population

Normal population

11-142

When sampling from a population with mean and finite standard deviation , the sampling distribution of the sample mean will tend to a normal distribution with mean and standard deviation as the sample size becomes large(n >30).

For “large enough” n:

When sampling from a population with mean and finite standard deviation , the sampling distribution of the sample mean will tend to a normal distribution with mean and standard deviation as the sample size becomes large(n >30).

For “large enough” n:

)/,(~ 2 nNX

n = 20

f ( X)

Large n

The Central Limit Theorem

11-143

Normal Uniform Skewed

Population

n = 30

General

The Central Limit Theorem Applies to Sampling Distributions from Any Population

11-144

SAMPLING DISTRIBUTION - EXAMPLE

Let us assume that we are interested in understandingWhat will be the average consumption of sugar per Household per month in a given target population?

What this means is we are interested to get theInformation ( µ = average sugar consumed /month)

We can only estimate the same based on sample Information. i.e. based on sample mean . This can beDone as follows.

11-145

For the example let us assume that we sampled randomly100 household and got the information that the sampleMean was 1890 grams per household per month. Let us also assume that the population standard deviationWas known as 230 grams .

We use the fact that the sample mean obtained was oneAmong the different sample means possible and that That the sample means would be normally distributed.

Hence an interval estimate can be obtained as µ = x-bar ± Z where Z = std. normal deviate

11-146

For the example let us assume that we sampled randomly100 household and got the information that the sampleMean was 1890 grams per household per month. Let us also assume that the population standard deviationWas known as 230 grams .

Substituting the values we get µ = 1890 ± Z x (230 / √100 ) for a 90% confidence Z = 1.28 ( refer Z table ) = 1890 ± 1.28 x (230 / √100 ) = 1890 ± 29.44 there is a 90% chance that the actual (µ ) will be contained within 1860.56 to 1919.44 grams.

11-147

FROM THE EXAMPLE JUST EXPLAINED YOU CAN SEE THAT

(sample mean(x-bar) – µ ) = Z

Error in estimating µ Is a function of

Hence is often referred to as ‘standard error’

Hence if error is known then the sample size can beDetermined ( this is based on sampling error alone)

11-148

Sampling without replacement

When we sample without replacement and from a finitePopulation the standard deviation of sample means( also known as standard error ) incorporates a Finite population multiplier and is as follows:

√ ( N-n)/(N-1)

Finite population Multiplier ( always ≤ 1)

N = population size n = sample size

It can be noted that as N goes to ∞ the multiplier Becomes = 1 and hence the standard error is the sameAs if the sampling is done with replacement.

11-149Sampling distribution for proportion

Consider an example :

The number of times a hotel is unable to accommodateTheir customer with rooms because the hotel is full.

This can only be expressed in terms of proportion i.e. 10% and so on.

In this case also the sampling distribution of proportionIf the sample size is large behaves like a normal Distribution with expected value of sample proportionEqual to population population and the standard error Equal to √(pq/n) , where ‘q’ = 100-p if ‘p’ in percentage

11-150Sampling distribution for proportion

Similarly interval estimation for proportion can be Found. Example if a sample of 1000 voters selectedAnd 400 of them decided to vote for a political party (X)

Then the proportion of the population that is expectedTo vote for the party (x) would be

P = 40% ±Z √(40*60/1000) = 40% ±3.038 ( Z = 1.96 for 95% confidence level) hence the interval would be 36.92% to 43.04%

11-151Sampling distribution of difference of two means

Consider a group of male employees and a group of Female employees in IT industry at a given level. It isDesired to understand what is the level of differenceIn their salaries ( given there is discrimination )

In this situation there could be many possible samples that can be drawn of size (n1) from males and similarlymany samples that can be drawn of size ( n2) from female employees.

For each sample taken from each group the sample mean can be substracted which will give us the levelof difference in the salary.

This is difference of two sample means and the distributionwould also behave like a normal distribution for large sample

Thus the sampling distribution for

Difference of two means is normally distributed With expected value (x (male) – x (female) ) = 0 and variance for the difference of two means = ss11

ss2222

++sStandard

11-154Sampling distribution for small sample

In our earlier discussion we have always emphasizedThe need for a large sample for the sample mean toBe distributed as a Normal Distribution

What is meant by LARGE sample ?

Gossett was working on samples which were Considered as small such as 10, 15, 25 etc and he Found that the sample mean distribution was not Exactly Normal distribution but was neverthelessSymmetric but with large variance. He denotedHis distribution as Student ‘t’ distribution.

Thus the mean value of ‘t’ = 0 and the probabilityDensity function was not only a function of the Mean and variance but also dependent on what he Called as “degree of freedom”

11-156Confidence Interval for a Confidence Interval for a Mean (Mean () with Unknown ) with Unknown

Degrees of FreedomDegrees of Freedom• Degrees of Freedom Degrees of Freedom (d.f.) is a parameter based (d.f.) is a parameter based

on the sample size that is used to determine on the sample size that is used to determine the value of the the value of the tt statistic. statistic.

• Degrees of freedom tell how many Degrees of freedom tell how many observations are used to calculate observations are used to calculate , less the , less the number of intermediate estimates used in the number of intermediate estimates used in the calculation.calculation.

= = nn - 1 - 1

Degrees of FreedomDegrees of Freedom• As As nn increases, the increases, the tt distribution approaches the shape of the distribution approaches the shape of the

normal distribution. normal distribution. • For a given confidence level, For a given confidence level, tt is always larger than is always larger than zz, so a , so a

confidence interval based on confidence interval based on tt is always wider than if is always wider than if zz were used. were used.

11-158

Degree of freedom

To understand the degree of freedom let usConsider the numbers 1 2 3 total 6

We will have the freedom to change any two out Of these three numbers without change in the total Thus the degree of freedom would be 2Thus degree of freedom would be (n-1) Where n = sample size.

Thus for a small sample and whenever the populationVariance is unknown, the distribution of the sample Means behaves like a ‘t’ distribution .

This ‘t’ distribution becomes very close to Normal Distribution when the degree of freedom is 29 and Above.

Hence the definition for a large sample in statistics Is when the sample size is 30 or more. For smaller than 30 the distribution needed would be‘t’ .

Student’s t DistributionStudent’s t Distribution• tt distributions are symmetric and shaped like the standard normal distribution. distributions are symmetric and shaped like the standard normal distribution.• The The tt distribution is dependent on the size of the sample. distribution is dependent on the size of the sample.

Thus for all calculations with small samples Z value will be substituted with ‘t’ values.

Usually small samples are not used when Proportions are involved.

11-162

Confidence Interval for a Mean (Confidence Interval for a Mean () with ) with Unknown Unknown

Use the Use the Student’s t distributionStudent’s t distribution instead of the instead of the normal distribution when the population is normal distribution when the population is normal but the standard deviation normal but the standard deviation is is unknown and the sample size is small.unknown and the sample size is small.

Student’s t DistributionStudent’s t Distribution

xx ++ tt ssnn The confidence interval for The confidence interval for (unknown (unknown ) is) is

xx - - tt ssnn

xx + + tt ssnn

< < < <

Student’s t DistributionStudent’s t Distribution

Comparison of z and tComparison of z and t• For very small samples, For very small samples, tt-values differ substantially from the normal.-values differ substantially from the normal.• As degrees of freedom increase, the As degrees of freedom increase, the tt-values approach the normal -values approach the normal zz-values.-values.• For example, for For example, for nn = 31, the degrees of freedom are: = 31, the degrees of freedom are:• What would the What would the tt-value be for a 90% confidence interval? -value be for a 90% confidence interval?

= 31 – 1 = 30= 31 – 1 = 30

Comparison of z and tComparison of z and t

For For = 30, the corresponding = 30, the corresponding zz-value is 1.645.-value is 1.645.

11-166

Confidence Interval for the Difference Confidence Interval for the Difference of of Two Means, small sample Two Means, small sample 11 – – 2 2

• The procedure for constructing a The procedure for constructing a confidence interval for confidence interval for 11 – – 22 depends on our assumption about depends on our assumption about the unknown variances.the unknown variances.

Assuming equal variances:Assuming equal variances:

(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22

nn11 + + nn22 - 2 - 211nn11

11nn22

with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) degrees of freedom – 1) degrees of freedom

11-167

Confidence Interval for the Difference Confidence Interval for the Difference of of Two Means, small sample Two Means, small sample 11 – – 2 2

(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22

nn11 + + nn22 - 2 - 211nn11

11nn22

with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) – 1)

degrees of freedomdegrees of freedom Pooled standarddeviation

11-168

Confidence Interval for the Confidence Interval for the Difference of Difference of Two Means, small sample Two Means, small sample 11 – – 2 2

(x(x11 – x – x22) ) ++ tt ((nn11 – 1) – 1)ss1122 + ( + (nn22 – 2) – 2)ss22

nn11 + + nn22 - 2 - 211nn11

11nn22

with with = ( = (nn11 – 1) + ( – 1) + (nn22 – 1) – 1)

degrees of freedomdegrees of freedom StandardError for DifferencesOf means

11-169

F- Distribution

F – distribution ( Fisher’s ) is the ratio of the variations.

If two samples are drawn and we wish to know Whether the samples are drawn from a single populationOr from two separate population, then an F- Statistic Is calculated.

This F- Statistic = ratio of samples variances of the two samples. ( )S1

2 / S22

11-170

F – distribution curve

F – Distribution is a probability density function whose shape of the curve is as follows:

F- Statistic

11-171

F- distribution

We will have more occasions to talk about This F- statistic later while discussing Hypothesis testing.

11-172

11-173

Chi-Square - Distribution

Chi-Square Distributed ( ) is a distributionWhen we wish to estimate the population varianceFrom a known sample variance.

Similarly there are many non parametric tests Where we would use a Chi-Square tests.

The shape of the Chi-square distribution varies withThe degree of freedom

11-174

Chi-Square distribution

Chi-square statistic

11-175

Chi-square distribution

Square of the Z distribution behaves like aChi-Square distribution.

Similarly a sum of the square of several NormalDistribution also behaves like a Chi-Square Distribution.

We will have more to talk about this distributionWhen we look at hypothesis testing.

11-176

11-177Visual Displays and Correlation Visual Displays and Correlation AnalysisAnalysis

• Begin the analysis of Begin the analysis of bivariate databivariate data (i.e., (i.e., two variables) with a two variables) with a scatter plotscatter plot..

• A scatter plot A scatter plot - displays each observed data pair (- displays each observed data pair (xxii, , yyii) as a dot on an x-y grid) as a dot on an x-y grid

• indicates visually the strength of the indicates visually the strength of the relationshi between the two variablesrelationshi between the two variables

Visual DisplaysVisual Displays

cost of maintenance per month

0 5 10 15 20 25 30 35

hrs driven per week

Correlation AnalysisCorrelation Analysis

Strong Positive Strong Positive CorrelationCorrelation

Weak Positive Weak Positive CorrelationCorrelation

Weak Negative Weak Negative CorrelationCorrelation

Strong Negative Strong Negative CorrelationCorrelation

No CorrelationNo Correlation

Nonlinear RelationNonlinear Relation

• The The sample correlation coefficientsample correlation coefficient ( (rr) ) measures the degree of linearity in measures the degree of linearity in the relationship between the relationship between XX and and YY..

-1 -1 << rr << +1 +1

• rr = 0 indicates no linear relationship = 0 indicates no linear relationship

Strong negative relationshipStrong negative relationship Strong positive relationshipStrong positive relationship

Cov ( x,y)= ------------- s(x) s(y)

11-184

Correlation coefficient can also be found As follows:

(n(∑xy) - ( ∑x ∑y)

√{(n∑x2 ) – (∑x)2 } x {(n∑y2) –(∑y)2} r =

11-185

11-186

Use of Excel for finding the correlation

11-187

11-188

11-189

11-190

Correlation coefficient can also be found As follows:

(n(∑xy) - ( ∑x ∑y)

√{(n∑x2 ) – (∑x)2 } x {(n∑y2) –(∑y)2} r =

0.9322

Hence r^2 = 0.869

11-191Properties of correlation coefficient

1. The value of ‘r’ always varies between -1 to +1

2. The change of origin and scale does not effect the value of the coefficient

( what this means is as follows)

11-192Change of origin and scale means

3. If ‘x’ and ‘y’ are interchanged the coefficient is not effected. i.e. it remains unaltered. ( we usually refer to ‘x’ as independent variable ‘y’ as dependent variable

4. the fourth property can be explained after we explain regression ( hence hold till such time )

11-194

Bivariate RegressionBivariate Regression

• Bivariate Regression Bivariate Regression analyzes the analyzes the relationship between two variables.relationship between two variables.

• It specifies one It specifies one dependentdependent ((responseresponse) variable and one ) variable and one independentindependent ( (predictorpredictor) variable.) variable.

• This hypothesized relationship may This hypothesized relationship may be linear, quadratic, or whatever.be linear, quadratic, or whatever.

What is Bivariate Regression?What is Bivariate Regression?

11-195

Chart Title

0 5 10 15 20 25 30 35

hrs of vehicle driven

11-196

Chart Title

0 5 10 15 20 25 30 35

hrs of vehicle driven

In the equation y= a + bx how to find the value of ‘a’ and ‘b’ which are the intercept and slope.

11-197

In the equation y= a + bx how to find the value of ‘a’ and ‘b’ which are the intercept and slope.

11-198

How to develop a Regression line

11-199

How to develop a Regression line

11-200

Normal Equations

∑ Y = na + b∑X ∑XY = a∑X + b∑X2

Which when simplified becomes (n∑ XY) – (∑X ∑Y) b= -----------------------

(n∑X^2 ) - (∑X)^2

a= Y(bar) – b X ( bar )

11-201

Normal Equations

For the problem considered earlier

b = 777.32 ( y – dependent variable x – independent variable)

And a= -6115.9

11-202

Normal Equations

If we had desired the normal equations for the Situation when ‘x’ = dependent variable and ‘y’ = independent variable

The normal equations would simply change so that Where there ‘x’ replace with ‘y’ and replace ‘y’ with ‘x’ You will find the numerator would remain unchangedBut the denominator would be

(n∑X^2 ) - (∑X)^2

11-203

Normal Equations

Hence the new a’ = 9.393

b’ = 0.001118

when x = dependent variable y= independent variable

3. If ‘x’ and ‘y’ are interchanged the coefficient is not effected. i.e. it remains unaltered. ( we usually refer to ‘x’ as independent variable ‘y’ as dependent variable

4. b x b’ = r^2 which is 777.32 x 0.001118 = 0.869

11-205

11-206

From here it follows:

1. Both the regression coefficients must have the same sign ( either + or - )

2. If one regression coefficient is greater than 1 then the other regression coefficient must be < 1 .

3. If one regression coefficient is < 1 then the other regression coefficient may be > or < than 1.

11-207

Regression TerminologyRegression Terminology

• Step 1: Step 1: - Highlight the data columns.- Highlight the data columns.- Click on the Chart Wizard and choose - Click on the Chart Wizard and choose ScatterScatter Plot Plot- In the completed graph, click once on the - In the completed graph, click once on the pointspoints in the scatter plot to select the data in the scatter plot to select the data- Right-click and choose Add Trendline- Right-click and choose Add Trendline- Choose Options and check Display Equation- Choose Options and check Display Equation

Fitting a Regression on a Scatter Plot in ExcelFitting a Regression on a Scatter Plot in Excel

11-208

11-209

11-210

11-211

Regression TerminologyRegression Terminology

11-212

11-213

WHAT IS A HYPOTHESIS

A hypothesis is a conjectural statement about the a Certain characteristic in the whole population or target Segment.

Ex: What is the average expenditure per month incurred on vehicle maintenance. If it is suggested that this average is Rs.1200 per month, then this will be a hypothesis. What this implied is that if we take all the people who drive their Vehicle and find the expenditure of everyone and average the resultsIt would be Rs. 1200 per month.

11-214

Ex: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol.

This implied that for the given model, if we take the average of All the vehicles and find its mileage per litre of petrol, it would be At least 15 kms to a litre of petrol.

11-215

Ex: An exporter claims that the proportion of defects in his consignment will be at most 2% .

This means that if we take all his consignments and find the proportionOf defects, the average defectives will not exceed 2%.

11-216

Ex: A refill manufacturer for ball point pens claims that the length of the refill on an average is 140 mm.

This would imply that while each of the refill’s may ( or cannot) Be exactly 140 mm but on an average the length of refills would be 140 mm.

11-217What do all these hypothesis show?

1. They are all conjectures about the population parameter.

2. They talk always about the population parameter

3. They are statements made only on the basis of the research question in hand and not on the basis of the data collected.

11-218Why do we need to do hypothesis testing

While we want to verify the statement specified in the Hypothesis, it would be impossible to do so without doingA census. Hence if a census is carried out then Hypothesis testing is not essential.

However we know that a census is not practical and also Need not be accurate. Hence we need to comment on theHypothesis on the basis of sample information only.

Hence we draw inference about the population parameterBased on sample information, this drawing of inference isCalled hypothesis testing.

11-219Characteristics of a good hypothesis

1. hypothesis should be based on sound previous research

2. look for realistic explanations

3. state the variables clearly

4. it should be easily amenable to test

5. measure the variables in the correct scale

11-220

Basics of hypothesis testing

As we said before the hypothesis is about the characteristicsAbout the population. We usually call it the ‘parameter’

Parameter can only be obtained by doing a censusWhich is not possible or not practical.

Hence our inference about the parameter is based on the Sample information. Hence based on a sample information we may reject a true Hypothesis and conversely we may accept a hypothesis When it is actually not true. Both of these are errors in making the inference.

11-221

Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol

The problem here is that if the average is more than 15 km/litre then we are satisfied but if it gives less than thanWhat is claimed by the dealer then we have difficulty in Believing the claim of the dealer. Hence this is what we Wish to verify or infer. This inference must be based on The information available from a single sample of size ‘n’This ‘n’ can be either 30 cars or 40 cars or even about 20 cars.

11-222

Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol

Hence we can write the hypothesis as follows:

Null Hypothesis :Generally we not disagree with the dealer to begin with unless there is sufficient evidence to disagree. Hence we write Null hypothesis: ( Ho ): µ ≥ 15Alternative hypothesis: ( Ha ): µ < 15

11-223

Consider the problem: A car dealer claims that on an average the mileage of a car ( given model ) gives at least 15 kms to a litre of petrol ( Ho ): µ ≥ 15 ( Ha ): µ < 15

11-224

Approaches for hypothesis testing

11-225

Errors in hypothesis testing

Hypothesis true

Hypothesis false

Accept hypothesis

No error Type II error or ß- error

Reject hypothesis

Type I error or alpha error

No error

11-226

Steps in Hypothesis Testing

1. Based on the research question develop Null and Alternative hypothesis

2. Decide on the level of Type I Error or alpha error.

3. Decide whether the test is a single tail or a two tail

11-227

4. Decide on the appropriate Test Statistic which will be used ( Z or t or any other )

5. Calculate the test statistic.

11-228

6. Read the test statistic for the level of type I error from The table of Z or ‘t’ etc. 7. Compare the calculated test statistic with that of the table value.

8. Make a conclusion.

11-229

Worked Example -1:

A manufacturing firm has been averaging shipping of a product within 30 days of receiving the order. Of late it is believed that the average shipping time hasincreased. To test this a sample of size 49 is drawn randomly from the Shipments made during a given period of time. The sample shows an averageshipping time of 36 days. The population standard deviation is believed to be 7 days.Is there sufficient evidence to believe that the shipping is getting delayed. A 5% level of significance test is thought to be good.

Step 1: Formulate the null and alternative hypothesis:

Ho: µ ≤ 30 Ha: µ > 30

11-230

Worked Example -1 contd:

Step 2: Decide on the level of significance or type I error

This is given in the problem as 5%

Step 3: Looking at the hypothesis it is clear that it is single tail and the problem area is to the right hence right tail.

11-231

Step 4: decide on the test statistic. Since sample size is > 30 we can go for the Z statistic.

Step 5: Calculate Z statistic = (x-bar - µ )/ S.E. ( standard error ) = (36 – 30 )/ (7 / √49) = 6

11-232

Step 5: Calculate Z statistic = (x-bar - µ )/ S.E. ( standard error ) = (36 – 30 )/ (7 / √49) = 6 Step 6: The Z ( table value at 5% level of alpha), single tail = 1.645.

Step 7. Since the Z (cal) > Z ( table value ) hence reject or unable to accept the null hypothesis

11-233

Worked Example -1 contd: ( single mean)

Step 7. Since the Z (cal) > Z ( table value ) hence reject or unable to accept the null hypothesis

Step 8. Conclusion: There appears to be a delay in the shipping time in recent times.

11-234

Worked example :-2 ( single mean)

Let us consider the car example: The dealer claims that a particularmodel gives at least 15Km/litre of fuel. A random sample of 36 carsgives a mean of 14.6 km/litre and a population standard deviation is assumed known as 0.75 km/litre. Assume 5% level of significance

Step 1. Ho: µ ≥ 15Ha: µ < 15

Step 2: The significance level is given as 5% .Step 3: This is also a single tail test but the direction is towards the left. Step 4: Since the sample size is large ( 36) we can use the Z – test.

11-235

Step 4: Since the sample size is large ( 36) we can use the Z – test. Step 5: Calculate the Z – statistic Z = (14.6 – 15 ) / ( 0.75/ √36) = -0.4 / (0.75/6) = -3.2Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645.

11-236

Step 5: Calculate the Z – statistic Z = (14.6 – 15 ) / ( 0.75/ √36) = -0.4 / (0.75/6) = -3.2Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645. Step 7. Compare Z calculated with Z table value for a left tail test the rule is if Z calculated ≤ Z table value reject Ho else accept Ho.

11-237

Step 6: Evaluate the value of Z at 5% level of significance. (remember this is now to the left side, hence ‘Z’ should be negative. hence it is Z = -1.645. Step 7. Compare Z calculated with Z table value for a left tail test the rule is if Z calculated ≤ Z table value reject Ho else accept Ho. In this Z(cal) = - 3.2 < Z ( table value) -1.645 hence reject Ho. Step 8. Conclusion is that the dealer claim cannot be accepted

11-238

Worked Example -3: ( single mean)

A refill manufacturer claims that the refill length for a ball Point pen is 140 mm long. A sample of size 100 is selected and finds that the mean length of refills is 141.77mm with a standard deviation of 5.88 mm. At 5% level of significanceCan it be concluded that the refills are of poor quality.

Step1: Null hypothesis and alternative hypothesis:

Ho: µ = 140 mmHa: µ ≠ 140 mm

Step 2: Alpha level is given at 5% level.

Step 3. In this case it is a two tail test as both sides are not acceptable because the refill will not fit in the pen and hence poor quality.

11-239

Step 4. Since the sample size is 100 which is large hence Z test can be done.

Step 5. Calculate the Z statistic : Z= (141.77 – 140) / ( 5.88 / √100) = 1.77 / 0.588 = 3.01

Step 6. Read the Z statistic from table at 5% two tail = 1.96

11-240

Step 5. Calculate the Z statistic : Z= (141.77 – 140) / ( 5.88 / √100) = 1.77 / 0.588 = 3.01

Step 6. Read the Z statistic from table at 5% two tail = 1.96Step 7: Z ( cal ) > Z ( table value) => hence reject Null hypothesis. Step 8. Conclusion: The refills produced are or poor quality

11-241Rule for rejecting a Null hypothesis

For right tail test ( single tail ): If Z ( cal ) ≥ Z ( table value ) Reject Ho

For a left tail test ( single tail) : If Z ( cal ) ≤ Z ( table value ) Reject Ho

For a two tail test : If |Z ( cal )| ≥ |Z (table value)| Reject Ho

It can be observed that even for a single tail Test if we consider the modulus value of Z then The same rule as that for two tail test can be Used for rejecting Ho ( use caution here )

11-242

When population σ is unknown

While conducting hypothesis testing, usually populationσ is unknown. Under these situations the sample standard deviation ( s ) is used instead of σ andHence the standard error would be = ( s/√ n).It must be made sure that sample standard deviation Must be calculated with ( n-1) in the denominator As stated in the earlier subject DRM 01 as only thenIt becomes unbiased estimator of ‘σ’. Further it must be ensured that the sample size should Be large ( definition of large was n > 30)

11-243

When sample size is small <30 When the sample size is < 30, it is considered as a Small sample and hence ‘t’ distribution should be usedInstead of ‘z’ . Further if the population standard deviationIs unknown, then also it is recommended that ‘t’ Distribution is used. Hence the only change is :

(Z) statistic = (x-bar - µ )/ S.E. ( standard error)

Replace with ‘t’

11-244Worked example – single proportion

Insurance companies have recently created difficulty in settling medicalClaims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up?Use 5% level of significance.

Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%Step 3: this is a single tail ( right tail ) test.

Insurance companies have recently created difficulty in settling medical Claims directly to the hospitals. One reason can be attributed to false Billing by individuals who have taken medical insurance. A company Believes that of recent there has been an increase in the number of False medical claims which has gone up to 5%. A random sample of 100 customers indicated that 7 customers had falsified their claim. Is there any reason to believe that false medical claims have gone up? Use 5% level of significance.

Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%

Step 3: this is a single tail ( right tail ) test.

Step 4. Z statistic will be used as sample size is largeStep 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n) = (7%-5%)/√(5%x95%)/100)

Step 1: Ho : p ≤ 5% Ha : p > 5%Step 2: level of significance given as 5%

Step 3: this is a single tail ( right tail ) test. Step 4. Z statistic will be used as sample size is largeStep 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n) = (7%-5%)/√(5%x95%)/100)

= 2/ (2.179) = 0.91Step 6. Z table value at 5% single tail = 1.645Step 7 : Compare Z cal with Z table value

Step 5. Z = (ṕ - p )/ standard error (ṕ - p )/ (√pq/n)

= (7%-5%)/√(5%x95%)/100) = 2/ (2.179) = 0.91Step 6. Z table value at 5% single tail = 1.645Step 7 : Compare Z cal with Z table value hence Z cal < Z table value Hence accept Ho

Step 8 : We cannot conclude that the number of false claims have gone beyond 5%. Even though sample shows a 7%.

11-248Hypothesis test for difference of two means

Let us consider the following situations:

Case-1: Does going to VLCC help in reducing weight ?

Case -2: Does the company always assess the rent for the residential quarter less than the employee himself?

Case-3: Is their gender discrimination among employers in a given industry for the same level of job?

Case -4: Is the new drug more effective in treatment of a disease than the existing drug?

11-249

In all cases we are taking about the difference of Two mean. Case-1: The mean before joining VLCC and the mean after joining VLCC (µbefore -µ after )

Case 2: The mean value of residential quarter assessed by the company and mean value of residential quarter assessed by the employee for whom it is meant. (µ(company) - µ ( employee)

Case 3: The mean wage given to women employees and the mean wage given to men employees (µ (women) - µ ( men )

Case 4: The mean time to recover with new drug and mean time to recover with existing drug. µ (new drug ) - µ ( old drug )

11-250

Despite each of these cases being a difference ofTwo means; there is one essential difference:-

Case 1 & 2:

In both these cases we are talking about the sameSample . Case 1: - Same sample weight before joining VLCC same sample weight after joining VLCC

Case 2: Same house assessed by Company Same house assessed by employee

11-251

Despite each of these cases being a difference ofTwo means; there is one essential difference:-

In Cases 3 & 4:-

Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawn

Case 4: mean time of recovery using new drug mean time of recovery using existing drug each sample is independently drawn. ( obviously the same patient cannot be both the drugs )

11-252

Difference of two means

Dependent Sample

Samples independently drawn

Both situations are treated differently.

11-253Difference of two means – dependent samples

Ex: It was intended to understand whether there is any difference in the productivity of a worker immediately after a weekly off or immediately before a weekly off. If the weekly off was on a Sunday, it was desired to find if the productivity is different on Saturday or on a Monday. Hence productivity was measured on Saturday’s and Monday’s for the same set of workers. The data are as follows: ( use 5% level of significance) Worker Id Productivity Sat Mon 1 25 28 2 32 29 3 20 29 4 26 36 5 29 35 6 21 30 7 18 32 8 17 24 9 27 25

11-254

Dependent sample case:

Step 1: To develop a Null and alternative hypothesis Ho: µ (sat) = µ (mon) Ha: µ (sat) ≠ µ (mon) the problem does not suggest that productivity is more on Saturday’s than on Monday’s or vice-versaHence it could either way. Hence ≠ symbol in the Alternative hypothesis:

This also implies: Ho: µ (sat) - µ (mon) = 0 or difference=0 Ha: µ (sat) - µ (mon) ≠ 0 or difference ≠0

11-255Difference of two means – dependent samples

Worker Id Productivity Sat Mon difference 1 25 28 -3 2 32 29 3 3 20 29 -9 4 26 36 -10 5 29 35 -6 6 21 30 -9 7 18 32 -14 8 17 24 -7 9 27 25 2

average difference = - 5.88 sample standard deviation(s) = 5.622

11-256

Step 2: Level of alpha is given as 5%

Step 3: This is a two tail test

Step 4: The test statistic will be a ‘t’ distribution as the sample size is small and also the ‘σ’ is unknown. t (cal) = diff – 0 / ( s/√ n) = (- 5.88 – 0) / (5.622/ √9 ) = -5.88 / 1.874 = -3.13 or modulus = 3.13Step 5: find the table value of ‘t’ for 5% two tail with a degree of freedom of 8 ( n-1) = 2.306.

11-257

Dependent sample case: Step 2: Level of alpha is given as 5%Step 3: This is a two tail test Step 4: The test statistic will be a ‘t’ distribution as the sample size is small and also the ‘σ’ is unknown. t (cal) = diff – 0 / ( s/√ n) = (- 5.88 – 0) / (5.622/ √9 ) = -5.88 / 1.874 = -3.13 or modulus = 3.13Step 5: find the table value of ‘t’ for 5% two tail with a degree of freedom of 8 ( n-1) = - 2.306. Step 6: Compare t(cal) with t(Table value) If |t(cal)|> |t(table value) reject Ho hence 3.13 > 2.306 Step 7: Reject HoStep 8 Conclusion : There is a change in the productivity between before a weekend and after a weekend.

11-258

Hence it is clear that in the case of a difference Of two means – dependent sample case Is treated as if it is a single mean case;

Further data is always obtained in pairs. And the sample sizes are usually less than 30 Hence a ‘t’ is normally used.

This test is also called Paired ‘t’ test

11-259

DIFFERENCE OF TWO MEANS – INDEPENDENT SAMPLE

Consider case 3 and 4 discussed earlier. Reproducing Below both the cases for ready reference

Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawn

Case 4: mean time of recovery using new drug mean time of recovery using existing drug each sample is independently drawn. ( obviously the same patient cannot be both the drugs )

11-260

Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Or depending on the problem it could have been a singleTail test. One tail or two tail depends on the ResearchQuestion being addressed.

What is the implication of acceptance of the Null Hypothesis?

It means that both groups belong to the same populationi.e. there is only population and hence one mean and variance

11-261

Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Or depending on the problem it could have been a singleTail test. One tail or two tail depends on the ResearchQuestion being addressed.

What if Null hypothesis is rejected or not accepted?

This would imply that all men belong to a population and all womenBelong to a different population and since there are two population;There are also two different means and the variance may be eitherEqual or unequal.

11-262

Case 3. mean wages for a group of females mean wages for a group of men each sample is independently drawnIn this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)

Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0

Writing the above hypothesis is Step 1.

Now we collect samples of the two groups and find out Their wages. Average wage for females and for men areSeparately calculated and also sample variances.

11-263

In this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0Writing the above hypothesis is Step 1.

Now we collect samples of the two groups and find out Their wages. Average wage for females and for men areSeparately calculated and also sample variances.

Step2: Decide on the level of significanceStep3: Decide whether single tail or two tail testStep4: Decide on the test statistic. Z for large sample and ‘t’ for small sample.Step5: Calculate Z statistic = {x-bar(women)-x-bar(men) - µ(women) - µ(men) } / S.Error Recall that in the course on sampling methods we have indicated the standard error for the difference of two meansindependent case.

11-264

In this case Null hypothesis: Ho : µ(women) = µ(men) Ha: µ(women) ≠ µ(men)Above implies Ho: µ(women) - µ(men) = 0 Ha: µ(women) - µ(men) ≠ 0Writing the above hypothesis is Step 1. Step2: Decide on the level of significanceStep3: Decide whether single tail or two tail testStep4: Decide on the test statistic. Z for large sample and ‘t’ for small sample.Step5: Calculate Z statistic = {x-bar(women)- x-bar(men) - µ(women) - µ(men) } / S.Error Recall that in the course on sampling methods we have indicated the standard error for the difference of two meansindependent case. Step 6: Find Z for alpha level of significance from tableStep 7: Compare Z (cal) with Z(alpha) : If Z (cal) ≥ Z(alpha) reject Ho

Step 8: Conclude your result.

11-265

Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that of marketing was 1.87. Step 1: Ho : µ(mktg) = µ(HR) µ(mktg) - µ(HR) = 0 Ha: µ(mktg) ≠ µ(HR) µ(mktg) - µ(HR) ≠ 0

11-266

Example: A firm is interested to understand whether there is any difference in the stress level of employees working in the HR department and in the Marketing department. A random sample of 30 HR employees were considered and their stress level measured as 5.36 in a scale of 10 and from a random sample of 40 marketing personnel showed a stress level of 6.23 in a scale of 10. At 5% level of significance can be conclude that the stress levels are different for the different groups. The variance in the stress levels for HR was 2.3 and that

of marketing was 1.87. Step 1: Ho : µ(mktg) = µ(HR) µ(mktg) - µ(HR) = 0 Ha: µ(mktg) ≠ µ(HR) µ(mktg) - µ(HR) ≠ 0

Step2: Alpha level is specified as 5%

Step 3: This is a two tail test

11-267

of marketing was 1.87.

Step 6: Calculate Z = (6.23-5.66)/ Standard error

11-268

Recall that Standard error for difference of two meansIndependent sample case is

If population variance ‘σ’ is unknown use Unbiased estimator ‘s’ – sample variance.

11-269

of marketing was 1.87.

Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486

11-270

of marketing was 1.87. Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486

Step 7: Z table value ( two tail ) 5% alpha = 1.96 Hence Z ( cal) > Z ( table value) Reject Ho. 2.486 > 1.96 reject Ho

11-271

of marketing was 1.87. Step 6: Calculate Z = (6.23-5.36)/ Standard error = 0.87/ √{(1.87/40) + (2.3/30)} = 0.87 / 0.35 = 2.486Step 7: Z table value ( two tail ) 5% alpha = 1.96 Hence Z ( cal) > Z ( table value) Reject Ho. 2.486 > 1.96 reject HoStep 8: Conclusion: Stress levels are different for marketing and HR.

11-272

11-273

Difference of two means – independent sample small sample

If the null hypothesis is accepted, it would implythat both group sample came from the same populationand hence for the one population there can be only one mean and one variance.

However, if the null hypothesis is rejected it implies thatBoth group sample belongs to different population andTherefore for each of the population there will be Different mean. But we assume that the two groupsHas the same variance. That is homogeneity of variancesIs assumed. When such assumption is made the standardError can be recalled as :

11-274

Difference of two means – independent sample small sample

Recall that in the case of small sample for independentSamples case; the standard error was calculated Using the pooled estimates as follows:

standard error = s(pooled)√{1/n1+1/n2)

Where s^2(pooled) = {(n1-1)s1^2} +{(n2-1)s2^2}

Or s(pooled) = √ s^2 (pooled)

(n1+n2-2)

( homogeneity of variance)

11-275

Worked example: independent small sample ‘t’ - test

A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows:

Brand A: 38, 37, 42, 44, 36, 39, 40, 41

Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.

Step -1: Ho: µ(a) = µ(b) Ha: µ(a) ≠ µ(b)

Step 2: Level of significance is known as 5%

11-276

A car manufacturer is intending to procure batteries for its given Model from two different vendors. However before procuring theyWish to know if the life of the two batteries would be similar. A Sample of batteries from both the manufacturers are selected Radomly and the life ( in months) was found as follows: Brand A: 38, 37, 42, 44, 36, 39, 40, 41 Brand B: 42, 41, 37, 39, 40, 43, 44, 45, 46, 48, 39Is there reason to believe that the life of batteries of the two brands are different ? Use 5% level of significance.

Step -1: Ho: µ(a) = µ(b) Ha: µ(a) ≠ µ(b)Step 2: Level of significance is known as 5%

Step 3: This is a two tail test as the question is only asking if the life of the two brands of batteries are different.

11-277

Step 4: Choosing the test statistic. Since the sample size is 8 and 11 respectively ( small ) a t-statistic is used to infer the hypothesis.

Step 5: Calculate the ‘t’ statistic: t= (mean for B/A – mean for B/B)- 0 standard error

11-278

Step 5..contd: mean life for brand A = 39.625 mean life for brand B = 42.18182 variance for brand A = 7.125 variance for brand B = 11.3636 pooled variance = 9.6183 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742

11-279

Step 5 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742 absolute value = 1.7742Step 6: tabulated value of ‘t’ for 5% alpha at 17 df = 2.1098

Step 7: Compare absolute values: ‘t’ (cal) < ‘t’(tablulated) hence unable to reject the null hypothesis

11-280

Step 5 t= (39.625-42.18182)-0 / 3.1013√(1/8+1/11) = -2.55682 / 1.44 = -1.7742 , absolute value = 1.7742Step 6: tabulated value of ‘t’ for 5% alpha at 17 df = 2.1098Step 7: Compare absolute values: ‘t’ (cal) < ‘t’(tablulated) hence unable to reject the null hypothesis

Step 8: Conclusion: Hence the mean life of batteries of both the brands are similar and hence both vendors can be considered for selection based on other considerations such as price, delivery etc.

11-281

One of the assumptions made to solve this problem Is that the population variances are equal even ifThe alternative hypothesis is accepted. However, weHave not checked this aspect. Hence it is necessaryTo do this check this aspect which we shall take up Now.

11-282

CHECKING FOR HOMOGENEITY OF POPULATION VARIANCE

HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW:

STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2

Step 2: Decide the level of significance : assume 5%

Step 3: this is a two tail test based on the alternative hypothesis

Step 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test:

Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627

11-283

HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW: STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2Step 2: Decide the level of significance : assume 5%Step 3: this is a two tail test based on the alternative hypothesisStep 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test: Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627

Step 6: Read the table value for ‘F’ from the table . This requires degree of freedom for the numerator and denominator which is (n1-1) and (n2-1) i.e. 7 and 10 respectively.

11-284

F- table – how to read.

The value of F when the blue shaded portionIs 0.975 , we take the reciprocal of F valueOf 0.025 with degree of freedom interchangedHence F ( 0.025) with 10,7 df = 4.76And 1/4.76 = 0.21.

11-285

11-286

HOMOGENEITY OF POPULATION VARAINCE IS CHECKED BYCARRYING A HYPOTHESIS TEST WHICH AS GIVEN BELOW: STEP 1: Ho: σ1^2 = σ2^2 Ha: σ1^2 ≠ σ2^2Step 2: Decide the level of significance : assume 5%Step 3: this is a two tail test based on the alternative hypothesisStep 4: Decide on the test statistic: For this test which is a ratio of the two sample variances is the ‘F’ test also known as Fisher’s Test: Step 5: Calculate ‘F’ statistic = s1^2/ S2^2 for the previous problem it is = 7.125 / 11.3636 = 0.627Step 6: Read the table value for ‘F’ from the table . This requires degree of freedom for the numerator and denominator which is (n1-1) and (n2-1) i.e. 7 and 10 respectively. Step 7: Now we can see that ‘F’ statistic calculated is in between table value of 0.21 and 3.95. Accept Ho. Step 8 : hence homogeneity of variance is established.

11-287

11-288

Difference of two proportions

A candidate who was interested in filing his nomination papers for anElection wanted to understand whether his popularity in two adjacentConstituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency.

In this problem we would not be able to check the Difference of two means but difference of two proportions. Two independent samples will be drawn from the two Constituency and find out how many support his candidature.

We can take a worked example to explain this test.

11-289

A candidate who was interested in filing his nomination papers for anElection wanted to understand whether his popularity in two adjacentConstituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples wereDrawn from each constituency A and B and preference for his candidaturewas measured by a survey. In constituency A – sample size 800 and 390Favored him and in Constituency B – sample size 900 and 490 favored himCan we say that Constituency B is favorable for this candidate. Use 5% Level of significance.

Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be used

11-290

A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a)

Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be used

Step 5: Calculate Z = {p(b)-p(a) }-0 / standard error for difference of two proportion

11-291

That the standard error for the difference of two proportionsIs given by

pp11(1 - (1 - pp11) + ) + pp22(1 - (1 - pp22))

nn11 nn22

Hence in this problem p(a) = 390 / 800 = 0.4875 or 48.75% p(b) = 490/900 = 54.44%

Hence standard error = √{48.75x51.25/800} x {(54.44x45.56/900} = 2.4246

11-292

A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be usedStep 5: Calculate Z = {p(b)-p(a) }-0 / standard error for porportion = (54.44-48.75) -0 / 2.4246 = 2.347Step 6: Read the table value of Z at 5% alpha, single tail = 1.645Step 7: Compare Z ( cal) with Z ( table value ) 2.347> 1.645 reject Ho.

11-293

A candidate who was interested in filing his nomination papers for an Election wanted to understand whether his popularity in two adjacent Constituency was equally population or he had more popularity in a Particular constituency. He then availed the services of a research agencyTo check his popularity in the two constituency. Random samples were Drawn from each constituency A and B and preference for his candidature was measured by a survey. In constituency A – sample size 800 and 390 Favored him and in Constituency B – sample size 900 and 490 favored him Can we say that Constituency B is favorable for this candidate. Use 5% Level of significance. Step 1: Ho: p(b) = p(a) Ha: p(b) > p(a) Step 2: level of alpha is given as 5% Step 3: this is a single tail test which is based on the questionStep 4: Choose the test statistic: Large sample and hence Z can be usedStep 5: Calculate Z = {p(b)-p(a) }-0 / standard error for porportion = (54.44-48.75) -0 / 2.4246 = 2.347Step 6: Read the table value of Z at 5% alpha, single tail = 1.645Step 7: Compare Z ( cal) with Z ( table value ) 2.347> 1.645 reject Ho. Step 8: Conclusion: Constituency B is more popular than A for this candidate.

11-294

Analysis of variance

Consider the following research question: Three different types of seeds are sown in a exactly similarTypes of soil and the same type of fertilizer is added for Each of the type of plants. The yield for the product is As follows: Yield ( million tons) seed A seed B Seed C plot-1 8 12 9 plot-2 9 10 10 plot-3 10 10 9 plot-4 9 13 8 plot-5 9 11 8

If all the seed varieties are similar they should givenSimilar yield and if some of them are superior thenOne or more type would give a larger yield.

11-295

Analysis of variance

Consider the following research question: Three different types of seeds are sown in a exactly similar Types of soil and the same type of fertilizer is added for Each of the type of plants. The yield for the product is As follows: Yield ( million tons) seed A seed B Seed C plot-1 8 12 9 plot-2 9 10 10 plot-3 10 10 9 plot-4 9 13 8 plot-5 9 11 8

Step 1: Ho: µa = µb = µc Ha: at least two means are unequal In this case a direct comparison is not feasible as which two can be compared. You need 3 different comparisons A with B , A with C and B with C. In this case the type I error would be very large . HenceWe must adopt another method.

11-296

Basics of ANOVA

11-297

Variance calculation

A) Calculate the correction factor (c/f)= GT^2/Total sampleB) Calculate the total sum of squares ( TSS) = square each value and sum it up – c/fC) Calculate sum of squares between samples (SSB) = (total for seedA)^2/n(a) + (total for seedB)^2/n(B)+ (total for seedC)^2/n(c) – c/f In the problem stated above the values for each of these Are as follows: c/f = (145^2/15)= 1401.66 TSS= 1431-1401.66= 29.34 SSB= 1419.4-1401.66 = 17.74Now we can construct the ANOVA table

11-298

ANOVA table ( step 2)

Source of variation

Sum of squares

Degree of freedom

Mean square

F (cal) F ( table value)

Between treatment

SSB K-1 MSB= SSB/df

MSB/ MSW

Within treatment

SSW n(a)+n(b)+n(c)-k

MSW = SSW/df

Total TSS= SSB+SSW

n(a) + n(b) + n(c)-1

11-299

ANOVA table ( filled in table )

Source of variation

Sum of squares

Degree of freedom

Mean square

Between treatment

17.74 2 8.866 9.178 6.93

Within treatment

11.6 12 0.966

Total 29.34 14

Mean Square is the variance .Hence MSB =variance between seeds MSW= variance within seeds

11-300

Step 2: Level of alpha is to be decidedStep 3: This is always a single tail test. Step 4: Since ratio of variances are being considered hence it is an ‘F’ – test of Fisher’s test. Step 5: Calculate the F( cal) as stated earlierStep 6: Read the F – table value for alpha level and df for between and withinStep 7: Compare F(cal) with F(table value) If F(cal) ≥ F(table value ) Reject HoStep 8: Conclusion.

11-301

ANOVA table ( filled in table ) Source of variation

Sum of squares

Degree of freedom

Mean square

Between treatment

17.74 2 8.866 9.178 6.93

Within treatment

11.6 12 0.966

Total 29.34 14

Hence F(cal) > F( table value) Reject HoConclusion: The three types of seeds do not give the same yield.

11-302Significance testing for correlation coefficient

Let us recall from the previous course on Sampling methodsWhere we had calculated the correlation coefficient basedOn sample information. The sample information contained only a few values for The independent variable and a corresponding few valuesFor the dependent variable. Hence if the correlation existsFor these values, then how can we be sure that if all theValues in the population are known, then a correlation willExist. To answer this question, it is necessary to test The significance of the correlation coefficient. The procedure is discussed below:

Step 1. Define the Null and Alternative hypothesis: Ho: ρ = 0 Ha: ρ ≠ 0 ρ = population correlation

Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test.

Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test. Step5: t= (r-ρ) / standard error for correlation

standard error = √{(1-r^2)/(n-2)} where ‘n’ = number of pairs (x,y) of sample data

Step6: Read table value of ‘t’ for significance level, and degree of freedom ( n-2)

Step 2: Decide on the level of alpha ( type I error) let us say it is 5%Step 3: This is a two tail test based on the sign of the alternative hypothesis. Step 4: Decide on the test statistic: Since the sample is usually small we use a ‘t’ – test. Step5: t= (r-ρ) / standard error for correlation standard error = √{(1-r^2)/(n-2)} where ‘n’ = number of pairs (x,y) of sample dataStep6: Read table value of ‘t’ for significance level, and df ( n-2)

Step 7: Compare ‘t’(cal) with ‘t’ (table value) If t(cal) ≥ t(table value) Reject HoStep 8: Conclude whether the correlation exists in the population

11-306Worked example: test for correlation

Consider a sample data collected on the number ofHours study done by student and the marks obtained by student. Data is as follows: Student hours of study /day marks obtained(%) a 12 63 b 10 68 c 8 53 d 9 60 e 15 75 f 14 80 g 11 68 h 13 53

11-307Worked example: test for correlationConsider a sample data collected on the number ofHours study done by student and the marks obtained by student. Data is as follows:

Student hours of study /day marks obtained(%) a 12 63 b 10 68 c 8 53 d 9 60 e 15 75 f 14 80 g 11 68 h 13 53

Correlation coefficient ‘r’ ( sample) = 0.613 You can refer to the lectures on Sampling methods forGetting the details of how to calculate this value.

Step 1: Ho: ρ = 0 Ha: ρ ≠ 0Step 2: Assume alpha level is 5%Step 3: This is a two tail testStep 4: Decide test statistic which is ‘t’ in this caseStep 5: calculate ‘t’ = (0.613-0)/√(1-0.613^2)/6 = 0.613/0.3225 = 1.90Step 6: ‘t’ ( table value) at 5% two tail , df=6 = 2.447Step7: Compare ‘t’ ( cal) with ‘t’ (table value) 1.90 < 2.447 Accept HoStep 8: Conclusion: There is no significant correlation between number of hours of study and marks obtained.

Step 1: Ho: ρ = 0 Ha: ρ ≠ 0Step 2: Assume alpha level is 5%Step 3: This is a two tail testStep 4: Decide test statistic which is ‘t’ in this caseStep 5: calculate ‘t’ = (0.613-0)/√(1-0.613^2)/6 = 0.613/0.3225 = 1.90Step 6: ‘t’ ( table value) at 5% two tail , df=6 = 2.447Step7: Compare ‘t’ ( cal) with ‘t’ (table value) 1.90 < 2.447 Accept HoStep 8: Conclusion: There is no significant correlation between number of hours of study and marks obtained. The above example clearly brings out that even though

There is non zero correlation in the sample data but we Cannot conclude in the population that a correlation exists.If the correlation had been a larger value or if the sample Size had been larger and then for the same correlation Coefficient we may have concluded that a correlation existsIn the population. This is important to understand

11-310Testing for the Regression coefficient

Recall that we had developed a regression equation to Estimate the dependent variable (y) if we know the valueOf the independent variable (x) . y= a + bx where y= dependent variable x = independent variable a = intercept b = regression coefficient (also known as slope)Just like correlation, the regression coefficient is based onSample data only and in order to use the same to estimate The regression coefficient in the population we need to testIts significance. Population equation would be given as Y = ßo + ß1(X) where ß1= regression coefficient in the population

11-311Testing for the Regression coefficient

Step1: Null and hypothesis: Ho: ß1 = 0 Ha: ß1 ≠ 0Step2: Decide on the level of alpha Step3: This is a two tail test based on the sign of HaStep4: Decide on the test statistic . Usually ‘t’ because the sample size would be smallStep5: Calculate ‘t’ statistic = (b1-ß1)/standard error(b1)Step6: Read the table value of ‘t’ for alpha level and df df = n-2 Step7: Compare ‘t’ (cal) with ‘t’ (table value) If ‘t’ (cal) ≥ ‘t’ (table value) Reject HoStep8: Conclusion.

11-312Worked example for testing of regression

coefficient

An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30 The equation developed for this

y= a + b(x) where a = -76.57 you can use the formula’s b = 10.44 given in the subject on sampling methods

11-313Understanding the regression equation

We first need to understand what is the regression equationThat we have developed. Getting a regression equation only means that we have minimized the error in estimating the value of ‘y’ but not made it zero. Therefore for the problem stated earlier we can calculate whatWould be the error that would be made if we use the equationDeveloped and what would be the error if we did not knowThe equation. If we did not know the equation we would Have used the mean value of ‘y’ to estimate ‘y’Then the error would have been ∑{y-y(bar)}^2If we know the equation then the error would be ∑{(y(actual) – y(est)}^2 let us calculate both these values for the problem just given

Total error =∑{y-y(bar)}^2 = 8440.9If we know the equation then the error would be Error if equation is known ∑{(y(actual) – y(est)}^2 = 1640.86This means that an error of Total error – Error still there 8440.9-1640.86 = 6800.04 has been explained because of the regression equation.

Source of error

Sum of squares of error

df Mean square

F(cal) F( table value)

Due to regression

6800.04 1 6800.04 33.15 11.26

Error still remaining (residual error)

1640.86 8 205.10

Mean square error

Total Error

8440.9 9

Error variance

Square root of the mean square error which is Gives the standard error of the estimate (Se) Se = √ 205.1 = 14.32.

Standard error for (b1) = Se / √{∑(x-x(bar))^2}∑(x-x(bar))^2} = 62.4√ 62.4 = 7.89Standard error (b1) = 14.32 / 7.89 = 1.814

coefficient

An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30Step 1: Ho: =0

Ha: ß1≠0Step2: Assume an alpha level of 5% Step3: This is a two tail test. Step4: Decide on the test statistic :- ‘t’ in this caseStep5: Calculate ‘t’ = (b-ß1) / standard error (b1) (10.44-0) / 1.814 = 5.755Step 6: Read table value ‘t’ , 5% alpha, df= 8 = 2.306

coefficient

An icecream vendor wants to determine the sale of his product based on the maximum temperature during the day. He collects data which are as follows: (y) sales(kgs) 223, 252, 230, 195, 185, 170, 272, 222, 215, 235(x) Temp(c) 27, 30, 31, 28, 26, 23, 32, 29, 28, 30Step 1: Ho: =0

Ha: ß1≠0Step2: Assume an alpha level of 5% Step3: This is a two tail test. Step4: Decide on the test statistic :- ‘t’ in this caseStep5: Calculate ‘t’ = (b-ß1) / standard error (b1) =(10.44-0) / 1.814 = 5.755Step 6: Read table value ‘t’ , 5% alpha, df= 8 = 2.306 Step7: ‘t’(cal) > ‘t’(table value) Reject HoStep8: The regression coefficient calculated in the sample is significant and can be used to find the interval estimate in for the population parameter.

11-319

References used for this subject

1. Statistics for Business and Economics, 5th edition, Paul Newbold, William Carlson & Betty Thorne Prentice Hall Publication. 2. General Statistics : Warren Chase and Fred Bown John Wiley Publication3. Marketing Research, 5th edition, Naresh Malhotra, Pearson Education Publication4. Statistics for Business and Economics, 8th edition, Anderson, Sweeney & Williams. Thomson South-Western Publication5. Complete Business Statistics, 6th edition, Azcel & Sounderpandian Tata McGraw Hill publication. 6. Applied Statistics in Business & Economics, David P. Doane & Lori E. Seward : Tata McGraw Hill

GM – 03 QUANTATIVE TECHNIQUES FOR MANAGERS. 11-2 Making Decisions Data, Information, Knowledge 1....

Documents

Transcript of GM – 03 QUANTATIVE TECHNIQUES FOR MANAGERS. 11-2 Making Decisions Data, Information, Knowledge 1....

HCS443 Week 5 Critique of Quantative Study FINAL

quantative aptitude

Qualitative and Quantative Bench Marking

GM AIMS Report Dictionary - Purdue University€¦ · GM AIMS Report Dictionary ∑ = Summarized Data ≡ = Detailed Data © 2010 Purdue University Page 2

ISE 421 QUANTATIVE PRODUCTION PLANNING - …homes.ieu.edu.tr/~aornek/ISE421_03.pdf · ISE 421 QUANTATIVE PRODUCTION PLANNING LECTURE III MRP, MRPII, ... Rough Cut Capacity Planning

Cyberpunk Data Fortress 2020 Unlimited GM Screen

SMARTDAC+ GM and UPM100 Power Monitoring Package Wiring … · User’s manual Title Number Data Acquisition System GM User’s Manual IM 04L55B01-01EN Data Acquisition System GM

Finding Speaker code for Gm data link radiooem-auto-accessory.com/REPAIRS/Finding Speaker code for Gm data... · Finding Speaker code for Gm data link radio The radio you order will

Quantative data

Toshiba Air Conditioning - RAV-GM Data Sheet

LIR 832 – Lecture 8 The Good, the Bad and the Ugly: Tufte: Good and Bad Analysis of Quantative Data Loury: Gender Earnings Gap Qualitative Data.

Quantative Methods

Data Acquisition System GM Integration Bar Graph Function ... · GM First Step Guide GM Data Acquisition System First Step Guide Refers to the IM 04L55B01-02EN. IP Address Configurator

Data Acquisition System GM General SpecificationsGM GM10 Data Acquisition Module GM90MB Module Base GM90PS Power Supply Module OVERVIEW The Data Acquisition System GM is a data logger

7010 QUANTATIVE SKILLS FOR THE BIOMEDICAL RESEARCHERquantitativeskills.org/module1/practice.exam1.pdf · 1 7010 QUANTATIVE SKILLS FOR THE BIOMEDICAL RESEARCHER This is a closed-book

Monetary policy Quantative techniques

Quantative Research

Quantative analysis and paleoecology of Middle to Upper ...

Quantative methods in derivatives pricing

Quantative corporate finance