The Normal Distribution Estimation Correlation (1)

THE NORMAL DISTRIBUTION DEFINITION: A continuous random variable X is said to be normally distributed if its density

function is given by:

for and for constants and , where

Notation: If X follows the above distribution, we write

The graph of the normal distribution is called normal curve.

Properties of the normal curve:

1. The curve is bell-shaped and symmetric about a vertical axis through the mean .

2. The normal curve approaches the horizontal axis asymptotically as we proceed in either

direction away from the mean.

3. The total area under the curve and above the horizontal axis is equal to 1.

0 1 2 3-3 -2 -1

DEFINITION: The distribution of a normal random variable with mean zero and standard

deviation equal to 1is called a standard normal distribution.

If , then X can be transformed into a standard normal random variable

through the following transformation:

If X is between the values , the random variable Z will fall between the

corresponding values:

Therefore,

Examples:

1. Let Z be a standard normal random variable. That is, . Find the following

probabilities: (see the z-table for the probabilities)

A.

B.

C.

D.

2. Let Z be a standard normal random variable. That is . Find the value of a.

A.

B.

C.

3. Let X be a normal random variable with . Find the following

probabilities:

A.

Therefore, the

B.

Therefore, the

C.

Therefore, the

4. Given a test with a mean of 84 and a standard deviation of 12.

A. What is the probability of an individual obtaining a score of 100 or above in this

test?

B. What score includes 50% of all the individuals who took the test?

C. If 654 students took the examination, then how many students got a score below

60?

Solution: Given: =84, =12

A.

Therefore, the probability of an individual obtaining a score of 100 or above on this test

is 0.0918 or 9.18%.

B. In notation form, the statement is equivalent to:

Finding the corresponding z-score of the probability 0.50, z = 0.00

From the transformation formula,

Therefore, the score that includes 50% of those who took the exam is 84.

C. Given: =84, =12, N= 654

The number of students who got a score lower than 60 is equal to the product of the

probability and the total number of students.

Exercise 6.2

1. Let Z be a standard normal variable. Find the following probabilities:

a.

b.

c.

d.

2. Given a normal distribution with = 82 and find the probability that X assumes

a value

a. Less than 78

b. More than 90

c. Between 75 and 80

3. The mean weight of 500 male students at a certain college is 151 pounds. And the

standard deviation is 15 pounds. Assume that the weights are normally distributed.

a. How many students weigh between 120 and 155 pounds?

b. What is the probability that a randomly selected male student weighs less than 128

pounds?

ESTIMATION

Basic Concepts of Estimation

Definition of terms:

Estimator- any statistic whose value is used to estimate an unknown parameter.

Estimate- a realized value of an estimator.

Point Estimate- a single value used to represent the parameter of interest.

Interval Estimator- a rule that tells us how to calculate two numbers based on a sample data,

forming an interval within which the parameter is expected to lie. The pair of numbers (a,b) is

called interval estimate or confidence interval.

Level of Confidence or confidence coefficient- the degree of certainty to an interval estimate

for the unknown parameter

Point Estimation of the mean and the Standard Deviation

A statistic is used to estimate parameters. The following are used to estimate the

parameters given below:

Parameter Statistic

Population mean ()

Population Standard Deviation ()

Interval Estimation of the Mean for a Single Population

Confidence Interval for , is known

If is the mean of a random sample of size n from a population with known variance

confidence interval for is given by

Note:

For small samples selected from nonnormal populations, we cannot expect our degree of

confidence to be accurate. However, for small samples of size , regardless of the shape

of most population, sampling theory guarantees good results.

To compute a confidence interval for , it was assumed that is known. Since

this is generally not the case, shall be estimated by s, provided

Example:

A survey of the delivery time of 100 orders worth P20,000 from WILLIAMS PIZZA

yielded a mean of 55 minutes with a standard deviation of 12 minutes. Assuming that the

delivery time follow a normal distribution, construct a 95% confidence interval for the true

mean.

Solution:

Given: minutes, 12 minutes, n = 100 orders, = 5%

Substituting the values in the formula:

we obtained:

Conclusion: The WILLIAMS PIZZA is 95% confident that the true mean delivery time is between

52.648 minutes and 57.352 minutes.

Error in Estimating the Population Mean

If is used as an estimate of , we can be confident that the error will

not exceed

Example:

The heights of a random sample of 50 college students showed a mean of 174.5 cm and

a standard deviation of 6.9 cm. What can we assert with 98% confidence about the possible size

of our error if we estimate the mean height of all college students to be 174.5?

Solution:

Given: = 174.5 cm, = 6.9 cm, n= 50 students, = 2%

The possible size of the error can be obtained by using

Substituting the values in the formula:

Conclusion: We can therefore conclude that we are 98% confident that the sample mean differs

from the true mean height by 2.27 cm.

Sample Size for Estimating the Population Mean

If is used as an estimate of , we can be confident that the error will

not exceed a specified amount e when the sample size is .

Example:

The monthly wage of new employees at a certain broadcasting company is said to follow

a normal distribution with a standard deviation of P1,000. How large sample would be needed

to be 99% confident that the sample mean will be within P300 of the true mean.

Solution:

Given: , , = 1%

by substitution:

Conclusion: Therefore we can conclude that the sample size should be 74 employees to be 99%

confident that the sample mean will be within P300 of the true mean wage.

Small-Sample Confidence Interval for , is unknown

If and s are the mean and standard deviation respectively, of a random sample of size

from an approximate normal population with unknown variance ,

confidence interval for is given by

where is the t value with degrees of freedom.

Note: Values for t are found in the Table of T-values

Example:

A random sample of 8 cigarettes of a certain brand has average nicotine content of 3.6

milligrams and a standard deviation of 0.9 milligrams. Construct a 99% confidence interval for

the true average nicotine content of this particular brand of cigarettes, assuming an

approximate normal distribution.

Solution:

Given: , 0.9 milligrams, n = 8 cigarettes, = 1%

with

by substitution:

we obtained:

Conclusion: Therefore we can conclude that we are 99% confident that the true average nicotine

content of a certain brand of cigarette is within 3.2818 milligrams and 3.9182 milligrams.

Exercise 7.

1. An electrical firm manufactures light bulbs that have a length of life that is

approximately normally distributed, with a standard deviation of 40 hours. If a random

sample of 30 bulbs has an average life of 780 hours, find a 96% confidence interval for

the population mean of all bulbs produced by this firm. How large a sample is needed if

we wish to be 96% confident that our sample mean will be within 10 hours of the true

mean?

2. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2

and 9.6 liters. Find a 95% confidence interval for the mean content of all such

containers, assuming an approximate normal distribution for container contents.

3. A random sample of 100 PUJ (Public utility jeep) shows that a jeepney is driven on the

average 24,500 km per year, with a standard deviation of 3,900 km.

a. Construct a 99% confidence interval for the average number of kilometer a jeepney

is driven annually.

b. What can we assert with 99% confidence about the possible size of our error if we

estimate the average number of km driven by jeepney drivers to be 23,500 km per

year?

4. Suppose that the time allotted for commercials on a primetime TV program is known to

have a normal distribution with a standard deviation of 1.5 minutes. A study of 35

showings gave an average commercial time of 10 minutes. Compute for the maximum

error. Construct a 95% confidence interval for the true mean.

5. A random sample of 12 female students in a certain dorm showed an average weekly

expenditure of P750 for snack foods, with a standard deviation of P175. Construct a 90%

confidence interval for the average amount spent each week on snack foods by female

students living in this dormitory, assuming the expenditures to be approximately

normally distributed.

6. The mean and standard deviation for the quality grade point averages of a random

sample of 28 college seniors are calculated to be 2.6 and 0.3 respectively. Find the 95%

confidence interval for the mean of the entire senior class. How large a sample is

required if we want to be 95% confident that our estimate of is not off by more than

0.05?

7. To estimate the average serving time at a fast food restaurant, a consultant noted the

time taken by 40 counter servers to complete a standard order (consisting of 2 burgers,

2 large fries and 2 drinks). The servers averaged 78.4 seconds with a standard deviation

of 13.2 seconds to complete the orders. What can the consultant assert with 95%

confidence about the maximum error if he uses seconds as an estimate of the

true average time required to complete this standard order?

8. A company surveyed 4400 college graduates about the lengths of time required to earn

their bachelors degrees. The mean is 5.15 years, and the standard deviation is 1.68

years. Based on these sample data, construct the 99% confidence interval for the mean

time required by all college graduates.

9. In a time-use study, 20 randomly selected managers were found to spend an average of

2.4 hours each day on paperwork. The standard deviation of the 20 observations is 1.30

hours. Construct a 95% confidence interval for the mean time spent on paperwork by

managers.

10. In a study of physical attractiveness and mental disorders 231 subjects were rated for

attractiveness, and the resulting sample mean and standard deviation are 3.94 and 0.75,

respectively. Determine the sample size necessary to estimate the sample mean,

assuming you want a 95% confidence and a margin of error of 0.05.

11. The number of incorrect answers on a true-false test for a sample of 15 students was

recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, 2. Estimate the variance.

12. In a study of the use of hypnosis to relieve pain, sensory ratings were measured for 16

subjects, with the results given below. Use these sample data to estimate the mean.

8.8 6.2 7.7 7.4 6.4 6.1 6.8 9.8 8.3 11.9 8.5 5.2

6.1 11.3 6.0 10.6

CORRELATION ANALYSIS

A correlation exists between two variables when one of them is related to the other in some way.

Correlation Analysis attempts to measure the strength of relationships between two variables by means of a single number called a correlation coefficient r.

The linear correlation coefficient r measures the strength of the linear relationship between the paired x and y values in the sample. This is also referred to as the Pearson product moment correlation coefficient in honor of Karl Pearson who originally developed it. The formula is given below:

2222

iiii

iiii

yynxxn

yxyxnr

Since r is computed from the sample data, it is a sample statistic.

Interpretation of the values of r r = 1 : perfect positive correlation between X and Y

0.5 r < 1 : strong positive correlation between X and Y

0 < r < 0.5 : positive correlation between X and Y

r = 0 : zero correlation

-0.5 < r < 0 : negative correlation between X and Y

-1 < r -0.5 : strong negative correlation between X and Y

r = -1 : perfect negative correlation between X and Y

Zero correlation means lack of linearity and not lack of association.

r measures the strength of the linear relationship. It is not designed to measure the strength of a relationship that is not linear.

The value of r is always between 1 and 1, that is 1 r 1 . (rounding off should be at least up to 3 decimal places)

Common errors in interpreting the results: 1. We must be careful to avoid concluding that a significant linear correlation

between two variables is a proof that there is a cause-effect relationship between them.

2. No significant linear correlation does not mean X and Y are not related in any way.

3. Rounding errors can wreak havoc with the results. Round the linear correlation coefficient to three decimal places.

Examples:

For numbers 1 to 4, identify the error in the stated conclusion and write the correct conclusion.

1. Given: The paired sample data result in a linear correlation coefficient very close to zero.

Conclusion: The two variables are not related in any way.

2. Given: There is a strong positive linear correlation between smoking and cancer.

Conclusion: Smoking causes cancer.

3. Given: x = age y = test score r = 0.40

Conclusion: Older people tend to get lower scores. 4. Given: There is a strong positive linear correlation between income and spending.

Conclusion: Increased spending is caused by increased income.

5. Ten students from the College of Business Administration were chosen to become

respondents in a study conducted to determine the relationship between the grades of

students ( X ) with their number of hours studying ( Y ). After computing the degree of

relationship, it was found out to be 0.575. What would be the conclusion?

6. The data on yearly consumption of cigarettes in the Philippines and the percentage of the

countrys population admitted to mental institutions as psychiatric cases were collected for 8

years. The correlation coefficient r = 0.61. What can we conclude about the data?

7. The temperature in a certain locality and number of pregnant women were found to have a

strong negative correlation. What would be the right conclusion?

EXAMPLES: Construct a scatter diagram, find r and interpret the results.

1. X 2 3 7 12 16 20 22

Y 14 20 9 14 5 1 15

2. X 9 4 5 4 2 6 3 7 2 8

Y 8 5 8 4 3 4 4 10 4 10

3. X 2 4 6 8 10 12

Y 6 12 18 24 30 36

4. X 25 64 75 35 86 15 19 66 37 9 12 9 47

Y 90 3 85 70 67 45 22 12 85 66 54 16 24

5.

X 3 4 3 4 5 6 5 6 7 8 7 8 9 11 9 10

Y 15 17 3 4 5 21 23 13 11 12 25 6 7 9 16 7

EXERCISES

A. Construct a scatter diagram, find r and interpret the results.

1. Grades of 6 students selected at random

MATH GRADE ( X ) 70 92 80 74 65 83

ENGLISH GRADE (Y) 74 84 63 87 78 90

2. The data below consists of weights in pounds of discarded paper and size of households

X (paper) 2.41 7.57 9.55 8.82 8.72 6.96 6.83 11.42

Y (household size) 2 3 3 6 4 2 1 5

3.The data below consists of number of persons in the household and the number of cars they

own

X (household size) 2 4 4 2 2 1 2 3 5

Y (cars) 2 0 2 2 1 1 3 0 2

4. The data below consists of age and the income in thousands of dollars

Age 60 63 51 25 47 56 19 24 25 20 66 19 48 52 27

Income 43.4 18.8 14.4 29.4 19.4 83 10.4 12.6 36.4 29.6 17.2 17.2 67 33 37.4

5. A teacher is interested in knowing whether or not two IQ tests produce linearly related

scores. A sample of 10 students was taken randomly. Five students took Test 1 and 5 students

took Test 2 in the morning. In the afternoon, those who took Test 1 took Test 2 and vice versa.

The results are shown in the table below:

STUDENT TEST 1 (X) TEST 2 (Y)

A 125 114

B 145 127

C 110 126

D 120 116

E 124 108

F 110 100

G 121 129

H 142 131

I 100 96

J 126 113

a. Plot a scatter diagram for these data. b. Solve for r. c. How well do the two tests relate linearly? Explain.

6. In a study of factors that affect success in a calculus score, data were collected for 10

different persons. Scores on an Algebra placement tests are given, along with Calculus

achievement scores.

a. Plot a scatter diagram for these data.

b. Find the value of the linear correlation coefficient r.

c. Test the significance of r at = 0.05.

ALGEBRA SCORE (X)

17 21 11 16 15 11 24 27 19 8

CALCULUS SCORE (Y)

73 66 64 61 70 71 90 68 84 52

7. One study was conducted to determine the relationship between the age and systolic blood pressure of 12 women.

Age ( X ) Systolic Blood Pressure ( Y )

56 147

42 125

72 160

36 118

63 149

47 128

55 150

49 145

38 115

42 140

68 152

60 155

a. Plot a scatter diagram for these data. b. Solve for r and interpret. c. What can you conclude about the relationship between age and systolic blood

pressure of women? Explain statistically.

The Normal Distribution Estimation Correlation (1)

Documents

Transcript of The Normal Distribution Estimation Correlation (1)