A4 sheet for exam

download A4 sheet for exam

of 3

Transcript of A4 sheet for exam

  • 8/7/2019 A4 sheet for exam

    1/3

    A large Darwin-based company has employed you to determine whether there is any association between the average volumesales per month and the period of selling experience of its sales representatives. You have been supplied with the followinginformation for a random sample of 10 of the companys sales persons:

    Sales (Y) Period of experience (X)(average number per month) (years)

    5 42 18 54 210 612 77 4

    7 55 36 2

    Y = 66 X = 39 XY = 304 Y2 = 512 X2 = 185

    n the basis of the above data:a) Draw a scatter diagram of these data.b) Find the equation that describes the regression line for these data.

    c) Does the value ofb1 show sufficient strength to conclude that 1 is greater than zero at the = 0.05 level?d) Find the correlation coefficient and the coefficient of determination, and comment on the strength of the linear

    relationship.

    e) Find the 95% confidence interval for the estimation of 1.olutionSSx = 32.9 SSy = 76.4 SSxy = 46. = 3.9 = 6.6

    a) Scatter diagram to be drawn on the board.

    b) Linear (regression) equation is given by: = b0+b1X b1 = SSxy / SSx b0 = -b1c) H0: 1=0, no significant relationship. H1: 1>0, there is significant relationship.

    Test statistic: t = b1- 1/Sb1Significance level: = 0.05Critical value: t , n-2Decision rule: Reject H0 if t > 1.86Value of the test statistics: t = b1/Sb1 Sb1= Se/SSx Se = (SSE/n-2) SSE = SSy SSxy2/ SSx t = = b1 - 1/Sb1= 1.42= 7.1Conclusion: Since t = 7.1 > 1.86, we reject H0. There is sufficient evidence to indicate that

    d) Correlation coefficient (r) = SSxy/(SSxSSy) There is a very strong linear relationship between period of experience and salesCoefficient of determination (r2) = (0.92)2 = 0.86 Therefore 86% of the variation in sales is explained by the varia

    in period of experience.

    e) 95% confidence interval for 1: b1 tn-2, 0.05 Sb1 LCL = b1 - tn-2, 0.05 Sb1 UCL = b1 + tn-2, 0.05 Sb1 LCL

    < UCL

    A point estimate for a parameter is a single number designed to estimate a quantitative parameter of a population, usually thevalue of the corresponding sample statistic.An interval estimate is an interval bounded by two values and used to estimate the value of a population parameter. The valuethat bound the interval are statistics calculated from the sample that is being used as the basis for the estimation.A confidence interval is an interval estimate with a specified level of confidence.

    A management consultant has been asked to investigate the punctuality of domestic passenger flights of a major Australian Aiarriving at Darwin International Airport. The consultant has taken a random sample of 20 incoming flights during a particular mand found that on average the planes are 2.73 minutes late with a standard deviation of 4.74 minutes. What is the 95 percentconfidence interval for the mean delay for the month as a whole?Given: n = 20, x = 2.73 minutes, s = 4.74 minutes, = 0.05, /2 = 0.025Required interval: x t( /2, n-1) s/n LCL = x +t( /2, n-1) s/n UCL = x t( /2, n-1) s/n

    The following data consists of a random sample of 75 individuals who were involved in bicycle accidents during a one-year pera regional city. Of the 75 individuals, 20 were wearing a safety helmet at the time of the accident, and 55 were not. 25 of theindividuals sustained head injuries and 50 did not. The data is presented in the table below.Test the hypothesis using either the classical approach or the p-value approach, head injury is independent of the wearing of helmet.Use = 0.05. Wearing

    Helmet

    HeadInjury

    Yes No Total

    Yes 8 17 25No 12 38 50Total 20 55 75

    Solution Expected frequency table e.g. calculation of expected frequency: e11 = (25)(20)/75 = 6.67 Wearing

    Helmet

  • 8/7/2019 A4 sheet for exam

    2/3

    HeadInjury

    Yes No Total

    Yes 6.67 18.33 25No 13.33 36.67 50Total 20 55 75

    H0: Head injury is independent of wearing of helmet. Ha: There is a dependencyDegrees of freedom: = (r-1)(c-1 Level of significance ( ) = 0.05 Critical value 2(df, ) Decision rule: Reject H0 if 2c 2(df, )Value of test statistics: 2c = [(0-E)2/E]Conclusion (classical approach): Since 2c < 2(df, ), we do not reject H0, and conclude that the data provides insufficient evide(at 0.05 level) that there is any dependency between helmet wearing and head injury.

    The mean distance travelled per shopping trip for local residents has historically been 8.32 kilometres. In 2008 a random sam

    25 shopping trips gave a mean distance travelled of 7.84 kilometres with a standard deviation of 5.12 kilometres. Does the daprovide sufficient evidence to conclude that the 2008 mean distance travelled per shopping trip has changed from the historicmean of 8.32 kilometres? Assume distance travelled is approximately normal. Use = 0.01 for the hypothesis test.SolutionGiven: = 8.32, n = 25, x = 7.84kms, s = 5.12kms, and = 0.01 Since is unknown, n small, use t distribution. H0:8.32kms H1: 8.32kms = 0.01, /2 = 0.005 critical value: t0.005, 24 decision rule: Reject H0 if |tc| > t0.005, 24Value of test statistics: tc = (- )/s, s = s/nConclusion: Since | tc | < t0.005, 24, we do not reject H0, and conclude that there is insufficient evidence (at 0.01 level) that meandistance has changed since 2008.

    What sample size would be needed to estimate the population mean to within one-half standard deviation with 95% confidencMaximum Error (E) = 0.5 Z( /2) = Z(0.025) = 1.96 n = [Z( /2) /E]2 E= Z( /2)( /n)

    In estimating average waiting time in an outpatients department based on a sample of patients it is proposed that the standarerror of the estimated time should not be greater than 5% of the estimated mean waiting time. On the basis of a pilot study arough estimate of the average waiting time of 75 minutes with standard deviation of 27 minutes has been made. What size ofsample of patients would be required to meet the above requirements?Solution Average waiting time = 75 minutes Standard Error = S/n (0.05)(75) = 27/n n = (27/3.75n = (7.2)2

    Central Limit Theorem states that whenever a random sample of size n is taken from anydistribution with mean and variancthen the sample mean will be approximatelynormally distributed with mean and variance 2/n. The larger the value of thsample size n, the better the approximation to the normal.

    The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values

    A discrete probability distribution is defined over a set value (such as a value of 1 or 2 or 3, etc). A continuous probabilitydistribution is defined over an infinite number of points (such as all values between 1 and 3, inclusive).

    Criteria for a Binomial Probability Experiment: An experiment is said to be a binomial experiment provided that is satisfies thefollowing.a. The experiment is performed a fixed number of times. Each repetition of the experiment is called a trial. Denote the numbertrials as n.b. Each trial has only two mutually exclusive outcomes success, denoted by s, and failure, denoted by f. (Note that the term"success" does not necessarily imply a good thing.)c. The trials are independent. That is, the outcome of one trial will not affect the outcome of the other trials.d. The probability of success remains the same from trial to trial. It is called the success probability, denoted by p. (Thus, theprobability of failure is (1 p).)4. The random variable Xis called a binomial random variable and denotes the number of successes in n Bernoulli trials. Thepossible values of this random variable are 0 x n.5. Note that since each trial is independent, the probability of each possible outcome of a binomial experiment (a combination s's and f's) can be found by multiplying the corresponding number ofp's and (1 p)'s. A tree diagram can help keep track of thoutcomes and their probabilities.

    Properties of a Normal Distribution: the normal curve is symmetrical about the mean ; the mean is at the middle and divides area into halves; the total area under the curve is equal to 1;it is completely determined by its mean and standard deviation variance 2)

    The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the l

    regression analysis involves finding the best straight line relationship to explain how the variation in an outcome (or dependenvariable, Y, depends on the variation in a predictor (or independent or explanatory) variable, X. Once the relationship has beenestimated we will be able to use the equation: = b0+b1X in order to predict the value of the outcome variable for different vaof the explanatory variable

    correlation analysis is concerned with determining the extent to which the variables of interest are related. It is a procedure thprovides a measure of the relative strength of the relationship.

    Students t-statistic - When s is used as an estimate for , the test statistic has two sources of variation:and s

  • 8/7/2019 A4 sheet for exam

    3/3

    The chi-square statistic is a nonparametric statistical technique used to determine if a distribution of observed frequencies difffrom the theoretical expected frequencies. Chi-square statistics use nominal (categorical) or ordinal level data, thus instead of means and variances, this test uses frequencies. The value of the chi-square statistic is given by 2c = [(0-E)2/E]. Generally tchi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming ththe model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies,normalized by the expected numbers, over all the categories. Data used in a chi-square analysis has to satisfy the followingconditions 1 Randomly drawn from the population, 2 reported in raw counts of frequency, 3 measured variables must beindependent, 4 observed frequencies cannot be too small, and 5 values of independent and dependent variables must be mutexclusive.