Stat 372 MT S05 Solution

University of Waterloo Statistics 372 Fall 2005

Midterm Tuesday, November 1st, 1-2:20 pm

Family Name (please print): ____________________________

Given Name (please print): ____________________________

Student ID number: _______________________

Aids permitted: calculator Tables of the Gaussian distribution and control chart constants are provided Time permitted: 80 minutes Instructions:

1. Check that your quiz has a total of 5 pages.

2. Answer all questions in the space provided. Use the back of the preceding page if necessary, indicating clearly that you have done so.

Question Mark Possible

1 8

2 5

3 9

4 8

TOTAL 30

Control Chart Constants r (sub-group size)

4c 3A 3B 4B 2 .7979 2.659 - 3.267 3 .8862 1.954 - 2.568 4 .9213 1.628 - 2.266 5 .9400 1.427 - 2.089 6 .9515 1.287 0.030 1.970 7 .9594 1.182 0.118 1.882 8 .9650 1.099 0.185 1.815 9 .9693 1.032 0.239 1.761 10 .9727 0.975 0.284 1.716

Constants for Xbar and s Charts

Xbar chart: 3ˆ A sµ ± s chart: 3 4,B s B s

X chart: 2.66y r±

2

1. Briefly answer each of the following unrelated questions.

a) Explain the purpose of phase I and II in control charting. [2 marks]

In phase I data are collected from a (hopefully) in-control process. The data are used to determine control limits and setup a control chart. In phase II we use the control chart produced in phase I for ongoing monitoring of the process. The control limits are extended into the future and more data is collected from the process and added to the control chart.

b) Suppose we fit the following regression model to a time series of quarterly sales data:

0 1 2 2 3 31Y Q Q Q Rβ β β β= + + + + , where 1 if first quarter

10 otherwise

Q

=

, 1 if second quarter

20 otherwise

Q

=

,

1 if third quarter3

0 otherwiseQ

=

, R ~ G(0,σ ) independent. Interpret the model parameter 2β . [2 marks]

2β represents the expected difference in average sales in the second quarter compared to the level in

Q4 (the baseline quarter with this model). c) For an AR(1) model, i.e. Y Y At t t− = − +−µ φ µ( )1 , where A Gt ~ ( , )0 σ derive the lag k autocorrelation,

denoted ρ k . [4 marks]

With the restriction on the parameter | | 1φ < we have

2 21 2 1 2

11 1

2

2

( , ) ( , )

( ..., ...)

( , ) ( , ) ...

, 1,2,...1

t t k t t k

t t t t k t k t k

k kt k t k t k t k

k

Cov Y Y Cov Y Y

Cov A A A A A A

Cov A A Cov A A

k

µ µ

φ φ φ φ

φ φ

σ φφ

− −

− − − − − − −

+− − − − − −

= − −

= + + + + + +

= + +

= =−

and similarly Var Yt( ) =−σ

φ

2

21

So the lag k autocorrelation, denoted ρ k , betweenYt and Yt k− is ρ

φ

kt t k

t t k

k

Cov Y Y

Var Y Var Y

k

=

= =

−

−

( , )

( ) ( )

, ,2, ...1

3

2. Assume that the quality characteristic of interest X follows a Gaussian (Normal) distribution with mean µ and standard deviation σ (assume µ and σ are known). We monitor the process using an X control chart with subgroups of size 5.

a) If the process mean shifts up to µ σ+ , what is the probability of detecting this magnitude of shift with

one subgroup? [2 marks]

The control limits for the X control chart are set at 3 5µ σ± . Denoting the plotted subgroup average as Y and assuming the process mean has shifted we have

( )~ , 5Y G µ σ σ+ . Then, the chance of a signal is p = ( ) ( )Pr 3 5 Pr 3 5Y Yµ σ µ σ> + + < − .

Standardizing, the probability is p = 3 5 3 5

Pr Pr5 5

Z Zσ σ σ σσ σ

− − −> + <

=

( ) ( )Pr 3 5 Pr 3 5Z Z> − + < − − , where ( )~ 0,1Z G . Using the Gaussian tables, p = 0.222

b) We may supplement the X chart in 2. a) with a runs rule that generates a signal if we see 2 out of 3

points in a row falling in zone A, where a point falls in zone A if it lies within the control limits and in the upper or lower third of the in-control region (see figure).

centreline

upper control limit

lower control limit

zone A

zone A

Xba

r

With this runs rule what is the chance of a false alarm? [3 marks]

Prob(falling in zone A) = q = ( )2Pr 2 5 3 5Yµ σ µ σ+ < < + . In control ( )~ , 5Y G µ σ . So,

Prob(falling in zone A) = q = ( )2Pr 2 3Z< < = 0.428, since ( )~ 0,1Z G . To get 2 out of 3 in a row in Zone A the possibilities are: AA (signals before next observation), NAA, or ANA. So chance of false alarm (i.e. signal when the process is in control) is ( )2 22 1q q q+ − = 0.005

4

3. At a delivery firm 40 drivers deliver packages. The data collected over the last year are:

Driver 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mistakes 6 1 0 14 0 2 18 2 5 13 1 4 6 5 Driver 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Mistakes 0 0 1 3 15 24 3 4 1 2 3 22 4 8 Driver 29 30 31 32 33 34 35 36 37 38 39 40 Mistakes 2 6 8 0 9 20 9 0 3 14 1 1

Some data summaries of the number of mistakes you may find useful include: sd(mistakes) = 6.57 average moving range = 7.2, and directly from R Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 1.00 3.50 6.00 8.25 24.00

30

20

10

0

-10

-20

40302010

mis

take

s

Index

a) Add appropriate control limits to the above plot. Explain your choice of control chart and show your calculations. [3 marks]

Notice that the given data are not a time series. As such the given order is arbitrary. Also, the number of mistakes is a count. As a result, the best choice is probably a control chart based on a Poisson assumption. As in the exercises, using ±3 sigma limits we get control limits at 3c c± , where c is the average number of mistakes. Since less than zero mistakes is not possible we use control limits at 0 and 13.3.

Without realizing the above, an X chart for individuals is the best. The control limits (derived from the data summary) would be ( )6 2.66 7.2± = (0, 25.2).

b) The manager in charge of this operation has been issuing a “disciplinary citation” to drivers for each

mistake. What do you think of this manager’s approach? [2 marks]

The manager’s approach is poor. The manager is over reacting to “natural variation” in the system whose causes are not under the control of the individual drivers. Punishing them for problems that are likely not their fault makes no sense and will upset the drivers.

c) Explain how the control chart you created could help this manager analyze the performance of this group

of drivers. [2 marks]

Using the control chart the number of mistakes for each driver is put into some larger context. The manager should concentrate attention on any drivers that fall outside the control limits. Otherwise to improve the overall process s/he must address the system.

d) In this context explain what would be considered a special cause. [2 marks]

A special cause is an input that has a large effect on the performance of one (or a few) drivers but not all the drivers. The special cause must act within a driver and not across all drivers.

5

4. In a machining process, problems occurred due to excess variation in the measurement system for the diameter of a precision ground shaft. To explore the measurement system further the team conducted an investigation where the diameter of the same (master) shaft was measured each hour for four days giving a total of 32 diameter measurements. Since the true diameter of the master shaft is (assumed) known the results are recorded as the measurement error (i.e. observed diameter minus true diameter). A plot of the 32 measurement errors (denoted ty ) by time is given below (along with a table of the data).

Day 1 Day 2 Day 3 Day 4 0.18 -0.55 -1.53 0.49 0.21 -1.15 -1.43 0.05 -0.36 -1.94 -1.71 -0.02 0.26 -1.86 -1.27 0.32 0.39 -2.53 -1.23 -0.16 -0.13 -2.46 -1.26 0.16 -0.27 -1.59 -0.75 0.47 -0.35 -2.01 -0.65 0.22

The team could not find the cause of the pattern (they speculated the cause was in the environment). They decided to improve the measurement system using a feedback controller. This required forecasting the measurement error (for the master shaft) in the future and then recalibrating the measurement system to compensate for large predicted measurement errors.

a) Given the observed data would you recommend forecasting with a regression model or a smoothing

method (e.g. moving average or exponentially weighted moving average - EWMA)? Explain. [2 marks] There is no clear trend or seasonal pattern in the plot that is likely to continue into the future. As such a regression model is not well suited to this data. Making a forecasting with a smoothing method is preferred.

b) Suppose the team decided to use a feedback controller based on EWMA forecasts with the smoothing parameter alpha equal to 0.2. In other words, the predicted measurement error at time t+1 (denoted

1ˆty + ) made at time t is 1ˆ ˆ0.2 0.8t t ty y y+ = + . Using this approach, explain how could you determine an approximate prediction interval for your prediction of the measurement error at time t+1. Note a numerical answer is not required. [3 marks] Using the EWMA smooth forecast on the existing data and calculate the mean squared error

( )32

1ˆ

32i ii

y yMSE =

−= ∑ , where ˆiy is the EWMA forecast for time i. Then an approximate prediction

interval for time t+1 is 1ˆ 2ty MSE+ ± . [Note in the notes MSE was defined with a square root.]

c) Another possible forecasting approach is to fit an AR(1) model, i.e. Y Y At t t− = − +−µ φ µ( )1 , where A Gt ~ ( , )0 σ independent. Fitting the model in R gives ar1 intercept 0.8758 -0.4260 s.e. 0.0771 0.5341 sigma^2 estimated as 0.1899: log likelihood = -19.55, aic = 45.1

Use the above results to derive a prediction for 33y and 34y . [3 marks]

prediction for 33y : To make a forecast we take the expected value of the model. Using the AR(1) model

and the R results: ( )33 32ˆˆ ˆ ˆ 0y yµ φ µ= + − + = -0.426+0.876(0.22+0.416) = 0.14

prediction for 34y : ( )34 33

ˆˆ ˆ ˆ ˆ 0y yµ φ µ= + − + = 0.07

3020100

0

-1

-2

hour

mea

sure

men

t err

or

Stat 372 MT S05 Solution

Documents

Transcript of Stat 372 MT S05 Solution