The Normal Distribution Estimation Correlation (1)

download The Normal Distribution Estimation Correlation (1)

of 16

description

The Normal Distribution Estimation Correlation (1)

Transcript of The Normal Distribution Estimation Correlation (1)

  • THE NORMAL DISTRIBUTION DEFINITION: A continuous random variable X is said to be normally distributed if its density

    function is given by:

    for and for constants and , where

    Notation: If X follows the above distribution, we write

    The graph of the normal distribution is called normal curve.

    Properties of the normal curve:

    1. The curve is bell-shaped and symmetric about a vertical axis through the mean .

    2. The normal curve approaches the horizontal axis asymptotically as we proceed in either

    direction away from the mean.

    3. The total area under the curve and above the horizontal axis is equal to 1.

    0 1 2 3-3 -2 -1

  • DEFINITION: The distribution of a normal random variable with mean zero and standard

    deviation equal to 1is called a standard normal distribution.

    If , then X can be transformed into a standard normal random variable

    through the following transformation:

    If X is between the values , the random variable Z will fall between the

    corresponding values:

    Therefore,

    Examples:

    1. Let Z be a standard normal random variable. That is, . Find the following

    probabilities: (see the z-table for the probabilities)

    A.

    B.

    C.

    D.

    2. Let Z be a standard normal random variable. That is . Find the value of a.

    A.

  • B.

    C.

    3. Let X be a normal random variable with . Find the following

    probabilities:

    A.

    Therefore, the

    B.

    Therefore, the

  • C.

    Therefore, the

    4. Given a test with a mean of 84 and a standard deviation of 12.

    A. What is the probability of an individual obtaining a score of 100 or above in this

    test?

    B. What score includes 50% of all the individuals who took the test?

    C. If 654 students took the examination, then how many students got a score below

    60?

    Solution: Given: =84, =12

    A.

    Therefore, the probability of an individual obtaining a score of 100 or above on this test

    is 0.0918 or 9.18%.

    B. In notation form, the statement is equivalent to:

    Finding the corresponding z-score of the probability 0.50, z = 0.00

  • From the transformation formula,

    Therefore, the score that includes 50% of those who took the exam is 84.

    C. Given: =84, =12, N= 654

    The number of students who got a score lower than 60 is equal to the product of the

    probability and the total number of students.

    Exercise 6.2

    1. Let Z be a standard normal variable. Find the following probabilities:

    a.

    b.

    c.

    d.

    2. Given a normal distribution with = 82 and find the probability that X assumes

    a value

    a. Less than 78

  • b. More than 90

    c. Between 75 and 80

    3. The mean weight of 500 male students at a certain college is 151 pounds. And the

    standard deviation is 15 pounds. Assume that the weights are normally distributed.

    a. How many students weigh between 120 and 155 pounds?

    b. What is the probability that a randomly selected male student weighs less than 128

    pounds?

    ESTIMATION

    Basic Concepts of Estimation

    Definition of terms:

    Estimator- any statistic whose value is used to estimate an unknown parameter.

    Estimate- a realized value of an estimator.

    Point Estimate- a single value used to represent the parameter of interest.

    Interval Estimator- a rule that tells us how to calculate two numbers based on a sample data,

    forming an interval within which the parameter is expected to lie. The pair of numbers (a,b) is

    called interval estimate or confidence interval.

    Level of Confidence or confidence coefficient- the degree of certainty to an interval estimate

    for the unknown parameter

    Point Estimation of the mean and the Standard Deviation

    A statistic is used to estimate parameters. The following are used to estimate the

    parameters given below:

  • Parameter Statistic

    Population mean ()

    Population Standard Deviation ()

    Interval Estimation of the Mean for a Single Population

    Confidence Interval for , is known

    If is the mean of a random sample of size n from a population with known variance

    confidence interval for is given by

    Note:

    For small samples selected from nonnormal populations, we cannot expect our degree of

    confidence to be accurate. However, for small samples of size , regardless of the shape

    of most population, sampling theory guarantees good results.

    To compute a confidence interval for , it was assumed that is known. Since

    this is generally not the case, shall be estimated by s, provided

    Example:

    A survey of the delivery time of 100 orders worth P20,000 from WILLIAMS PIZZA

    yielded a mean of 55 minutes with a standard deviation of 12 minutes. Assuming that the

    delivery time follow a normal distribution, construct a 95% confidence interval for the true

    mean.

    Solution:

    Given: minutes, 12 minutes, n = 100 orders, = 5%

    Substituting the values in the formula:

  • we obtained:

    Conclusion: The WILLIAMS PIZZA is 95% confident that the true mean delivery time is between

    52.648 minutes and 57.352 minutes.

    Error in Estimating the Population Mean

    If is used as an estimate of , we can be confident that the error will

    not exceed

    Example:

    The heights of a random sample of 50 college students showed a mean of 174.5 cm and

    a standard deviation of 6.9 cm. What can we assert with 98% confidence about the possible size

    of our error if we estimate the mean height of all college students to be 174.5?

    Solution:

    Given: = 174.5 cm, = 6.9 cm, n= 50 students, = 2%

    The possible size of the error can be obtained by using

    Substituting the values in the formula:

    Conclusion: We can therefore conclude that we are 98% confident that the sample mean differs

    from the true mean height by 2.27 cm.

    Sample Size for Estimating the Population Mean

    If is used as an estimate of , we can be confident that the error will

    not exceed a specified amount e when the sample size is .

  • Example:

    The monthly wage of new employees at a certain broadcasting company is said to follow

    a normal distribution with a standard deviation of P1,000. How large sample would be needed

    to be 99% confident that the sample mean will be within P300 of the true mean.

    Solution:

    Given: , , = 1%

    by substitution:

    Conclusion: Therefore we can conclude that the sample size should be 74 employees to be 99%

    confident that the sample mean will be within P300 of the true mean wage.

    Small-Sample Confidence Interval for , is unknown

    If and s are the mean and standard deviation respectively, of a random sample of size

    from an approximate normal population with unknown variance ,

    confidence interval for is given by

    where is the t value with degrees of freedom.

    Note: Values for t are found in the Table of T-values

    Example:

    A random sample of 8 cigarettes of a certain brand has average nicotine content of 3.6

    milligrams and a standard deviation of 0.9 milligrams. Construct a 99% confidence interval for

    the true average nicotine content of this particular brand of cigarettes, assuming an

    approximate normal distribution.

    Solution:

  • Given: , 0.9 milligrams, n = 8 cigarettes, = 1%

    with

    by substitution:

    we obtained:

    Conclusion: Therefore we can conclude that we are 99% confident that the true average nicotine

    content of a certain brand of cigarette is within 3.2818 milligrams and 3.9182 milligrams.

    Exercise 7.

    1. An electrical firm manufactures light bulbs that have a length of life that is

    approximately normally distributed, with a standard deviation of 40 hours. If a random

    sample of 30 bulbs has an average life of 780 hours, find a 96% confidence interval for

    the population mean of all bulbs produced by this firm. How large a sample is needed if

    we wish to be 96% confident that our sample mean will be within 10 hours of the true

    mean?

    2. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2

    and 9.6 liters. Find a 95% confidence interval for the mean content of all such

    containers, assuming an approximate normal distribution for container contents.

    3. A random sample of 100 PUJ (Public utility jeep) shows that a jeepney is driven on the

    average 24,500 km per year, with a standard deviation of 3,900 km.

    a. Construct a 99% confidence interval for the average number of kilometer a jeepney

    is driven annually.

    b. What can we assert with 99% confidence about the possible size of our error if we

    estimate the average number of km driven by jeepney drivers to be 23,500 km per

    year?

    4. Suppose that the time allotted for commercials on a primetime TV program is known to

    have a normal distribution with a standard deviation of 1.5 minutes. A study of 35

    showings gave an average commercial time of 10 minutes. Compute for the maximum

    error. Construct a 95% confidence interval for the true mean.

  • 5. A random sample of 12 female students in a certain dorm showed an average weekly

    expenditure of P750 for snack foods, with a standard deviation of P175. Construct a 90%

    confidence interval for the average amount spent each week on snack foods by female

    students living in this dormitory, assuming the expenditures to be approximately

    normally distributed.

    6. The mean and standard deviation for the quality grade point averages of a random

    sample of 28 college seniors are calculated to be 2.6 and 0.3 respectively. Find the 95%

    confidence interval for the mean of the entire senior class. How large a sample is

    required if we want to be 95% confident that our estimate of is not off by more than

    0.05?

    7. To estimate the average serving time at a fast food restaurant, a consultant noted the

    time taken by 40 counter servers to complete a standard order (consisting of 2 burgers,

    2 large fries and 2 drinks). The servers averaged 78.4 seconds with a standard deviation

    of 13.2 seconds to complete the orders. What can the consultant assert with 95%

    confidence about the maximum error if he uses seconds as an estimate of the

    true average time required to complete this standard order?

    8. A company surveyed 4400 college graduates about the lengths of time required to earn

    their bachelors degrees. The mean is 5.15 years, and the standard deviation is 1.68

    years. Based on these sample data, construct the 99% confidence interval for the mean

    time required by all college graduates.

    9. In a time-use study, 20 randomly selected managers were found to spend an average of

    2.4 hours each day on paperwork. The standard deviation of the 20 observations is 1.30

    hours. Construct a 95% confidence interval for the mean time spent on paperwork by

    managers.

    10. In a study of physical attractiveness and mental disorders 231 subjects were rated for

    attractiveness, and the resulting sample mean and standard deviation are 3.94 and 0.75,

    respectively. Determine the sample size necessary to estimate the sample mean,

    assuming you want a 95% confidence and a margin of error of 0.05.

    11. The number of incorrect answers on a true-false test for a sample of 15 students was

    recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, 2. Estimate the variance.

    12. In a study of the use of hypnosis to relieve pain, sensory ratings were measured for 16

    subjects, with the results given below. Use these sample data to estimate the mean.

    8.8 6.2 7.7 7.4 6.4 6.1 6.8 9.8 8.3 11.9 8.5 5.2

    6.1 11.3 6.0 10.6

  • CORRELATION ANALYSIS

    A correlation exists between two variables when one of them is related to the other in some way.

    Correlation Analysis attempts to measure the strength of relationships between two variables by means of a single number called a correlation coefficient r.

    The linear correlation coefficient r measures the strength of the linear relationship between the paired x and y values in the sample. This is also referred to as the Pearson product moment correlation coefficient in honor of Karl Pearson who originally developed it. The formula is given below:

    2222

    iiii

    iiii

    yynxxn

    yxyxnr

    Since r is computed from the sample data, it is a sample statistic.

    Interpretation of the values of r r = 1 : perfect positive correlation between X and Y

    0.5 r < 1 : strong positive correlation between X and Y

    0 < r < 0.5 : positive correlation between X and Y

    r = 0 : zero correlation

    -0.5 < r < 0 : negative correlation between X and Y

    -1 < r -0.5 : strong negative correlation between X and Y

    r = -1 : perfect negative correlation between X and Y

    Zero correlation means lack of linearity and not lack of association.

    r measures the strength of the linear relationship. It is not designed to measure the strength of a relationship that is not linear.

    The value of r is always between 1 and 1, that is 1 r 1 . (rounding off should be at least up to 3 decimal places)

    Common errors in interpreting the results: 1. We must be careful to avoid concluding that a significant linear correlation

    between two variables is a proof that there is a cause-effect relationship between them.

    2. No significant linear correlation does not mean X and Y are not related in any way.

    3. Rounding errors can wreak havoc with the results. Round the linear correlation coefficient to three decimal places.

  • Examples:

    For numbers 1 to 4, identify the error in the stated conclusion and write the correct conclusion.

    1. Given: The paired sample data result in a linear correlation coefficient very close to zero.

    Conclusion: The two variables are not related in any way.

    2. Given: There is a strong positive linear correlation between smoking and cancer.

    Conclusion: Smoking causes cancer.

    3. Given: x = age y = test score r = 0.40

    Conclusion: Older people tend to get lower scores. 4. Given: There is a strong positive linear correlation between income and spending.

    Conclusion: Increased spending is caused by increased income.

    5. Ten students from the College of Business Administration were chosen to become

    respondents in a study conducted to determine the relationship between the grades of

    students ( X ) with their number of hours studying ( Y ). After computing the degree of

    relationship, it was found out to be 0.575. What would be the conclusion?

    6. The data on yearly consumption of cigarettes in the Philippines and the percentage of the

    countrys population admitted to mental institutions as psychiatric cases were collected for 8

    years. The correlation coefficient r = 0.61. What can we conclude about the data?

    7. The temperature in a certain locality and number of pregnant women were found to have a

    strong negative correlation. What would be the right conclusion?

    EXAMPLES: Construct a scatter diagram, find r and interpret the results.

    1. X 2 3 7 12 16 20 22

    Y 14 20 9 14 5 1 15

    2. X 9 4 5 4 2 6 3 7 2 8

    Y 8 5 8 4 3 4 4 10 4 10

  • 3. X 2 4 6 8 10 12

    Y 6 12 18 24 30 36

    4. X 25 64 75 35 86 15 19 66 37 9 12 9 47

    Y 90 3 85 70 67 45 22 12 85 66 54 16 24

    5.

    X 3 4 3 4 5 6 5 6 7 8 7 8 9 11 9 10

    Y 15 17 3 4 5 21 23 13 11 12 25 6 7 9 16 7

    EXERCISES

    A. Construct a scatter diagram, find r and interpret the results.

    1. Grades of 6 students selected at random

    MATH GRADE ( X ) 70 92 80 74 65 83

    ENGLISH GRADE (Y) 74 84 63 87 78 90

    2. The data below consists of weights in pounds of discarded paper and size of households

    X (paper) 2.41 7.57 9.55 8.82 8.72 6.96 6.83 11.42

    Y (household size) 2 3 3 6 4 2 1 5

    3.The data below consists of number of persons in the household and the number of cars they

    own

    X (household size) 2 4 4 2 2 1 2 3 5

    Y (cars) 2 0 2 2 1 1 3 0 2

    4. The data below consists of age and the income in thousands of dollars

  • Age 60 63 51 25 47 56 19 24 25 20 66 19 48 52 27

    Income 43.4 18.8 14.4 29.4 19.4 83 10.4 12.6 36.4 29.6 17.2 17.2 67 33 37.4

    5. A teacher is interested in knowing whether or not two IQ tests produce linearly related

    scores. A sample of 10 students was taken randomly. Five students took Test 1 and 5 students

    took Test 2 in the morning. In the afternoon, those who took Test 1 took Test 2 and vice versa.

    The results are shown in the table below:

    STUDENT TEST 1 (X) TEST 2 (Y)

    A 125 114

    B 145 127

    C 110 126

    D 120 116

    E 124 108

    F 110 100

    G 121 129

    H 142 131

    I 100 96

    J 126 113

    a. Plot a scatter diagram for these data. b. Solve for r. c. How well do the two tests relate linearly? Explain.

    6. In a study of factors that affect success in a calculus score, data were collected for 10

    different persons. Scores on an Algebra placement tests are given, along with Calculus

    achievement scores.

    a. Plot a scatter diagram for these data.

    b. Find the value of the linear correlation coefficient r.

  • c. Test the significance of r at = 0.05.

    ALGEBRA SCORE (X)

    17 21 11 16 15 11 24 27 19 8

    CALCULUS SCORE (Y)

    73 66 64 61 70 71 90 68 84 52

    7. One study was conducted to determine the relationship between the age and systolic blood pressure of 12 women.

    Age ( X ) Systolic Blood Pressure ( Y )

    56 147

    42 125

    72 160

    36 118

    63 149

    47 128

    55 150

    49 145

    38 115

    42 140

    68 152

    60 155

    a. Plot a scatter diagram for these data. b. Solve for r and interpret. c. What can you conclude about the relationship between age and systolic blood

    pressure of women? Explain statistically.