Download - MATH1725 Revision Questions - Mathematics at Leedssta6ajb/math1725/revision.pdf · MATH1725 Revision Questions File last updated at 21:13 on May 18, 2015 How this ﬁle is arranged

MATH1725 Revision Questions

File last updated at 21:13 on May 18, 2015

How this file is arranged

I answer general questions first. These are arranged in date order, with the most recent

questions answered first.

Then I answer questions about the examination past papers starting with the most recent

examination paper (2014) and ending with the oldest examination paper asked about. For

any given examination paper, I answer questions about section A first and then questions

about section B.

General questions

General question: 18/5/2015

Bit last minute but would you be able to explain the differences when testing paired data

vs unpaired data for differences? I understand how to tell which kind of data it is but then

I’m not 100% what the differences in testing the data are.

Response: If paired (matched) data, you form the differences di = xi − yi. You then want

to test H0, whether the mean difference µd equals zero. Thus your test statistic is

t =d

√

s2d/n

which has a tn−1 distribution if H0 is true.

If two independent samples (with unknown variances but known to equal σ2) you use

the test statistic

t =x − y

√

s2 ((1/nx) + (1/ny))

which has a tnx+ny−2 distribution if H0 is true, that the means are equal.


I’m several of the past papers we are asked to define what is meant by the “P -value of the

test” and “5% significance level”. Am I right to say the 5% significant level is the probability

of rejecting the null hypothesis when it is true is 0.05. And is the P -value of the test the

probability of X as or more extreme that x observed if null hypothesis is true. I’m not sure

if maybe I’m saying the same thing here.

Response: A 5% significance level means, as you correctly state, that:

pr{Reject H0 when H0 true} = 0.05.

The P -value of the test is essentially the probability of obtaining a value as or more

extreme than the one observed when assuming the null hypothesis is true. Thus if testing

1

H0: µ = 0 against H1: µ > 0, then

P = pr{

|X| > |x| when H0 is true}

.

If testing H0: µ = µ0 against H1: µ 6= µ0, then

P = pr{

|X − µ0| > |x − µ0| when H0 is true}

.


A question from a past paper is “Use a chi-squared-goodness-of-fit test to determine whether

the fitted normal distribution gives a good fit to these data”. I was wondering, would I choose

the level of significance myself or would a small chi-squared value (such as 0.551) be enough

to conclude that it is a good fit?

Response: I would expect you to choose a significance level, say 10% or 5% or 1%.

You could report the P -value of the test, but you would need to use a computer (say)

to determine the P -value of your test.

Though a small value of χ2obs would be indicative of accepting a null hypotgesis, strictly

you should do a formal test with pre-specified significance level.

For example, for k = 1 degrees of freedom, χ2obs = 0.551 would correspond to P = 0.4579.

Using R you would have:

1-pchisq(0.551,1) # pchisq command gives cumulative probability.

[1] 0.4579094


Am I correct in thinking that rXY and ρXY are the same in principle, but the former is for

a sample and the latter is for a population?

Response: Yes. Sample correlation is rXY . For random variables X and Y , corr(X, Y ) =

ρXY .


When do you use z =x − µ

σand when to use z =

x − µ√

σ2/nif that makes sense. I have a

general idea but I recently used the wrong one in a past paper.

Response: If X ∼ N(µ, σ2), then Z =X − µ

σ∼ N(0, 1). Thus pr{X < 1} = pr

{

Z <1 − µ

σ

}

.

If Xi ∼ N(µ, σ2) for i = 1, 2, . . . , n, with the Xi independent, then the mean X satisfies

X ∼ N(µ, σ2/n). In this case Z =X − µ√

σ2/n∼ N(0, 1). Thus pr

{

X < 1}

= pr

{

Z <1 − µ√

σ2/n

}

.

However this latter case is really the same as the former case! For put v2 = σ2/n.

Then you can see that X ∼ N(µ, v2) and so in this case Z =X − µ

v∼ N(0, 1). Thus

pr{

X < 1}

= pr

{

Z <1 − µ

v

}

.

2

Don’t try to learn lots of different formulae. Try to understand the concepts:

(1) Standardization: If X ∼ N(µ, variance=σ2), then Z =X − µ

σ∼ N(0, 1).

(2) Distribution of the mean: If Xiind∼ N(µ, σ2), then X ∼ N(µ, variance=σ2/n).


When calculating the sample variance s2, which is more appropriate, the exact value of the

sample mean, like a fraction, or a value rounded to the nearest decimal point?

Response: Consider using the formula

s2 =1

n − 1

(

n∑

i=1

x2i − nx2

)

.

In rounding values of x you need to be aware of rounding errors in your calculations.

As an example, suppose we have n = 9 values 1.1, 1.2, 1.4, 1.5, 1.6, 1.8, 1.9, 2.1, 2.3.

Here9∑

i=1

xi = 14.9, x =1

9

9∑

i=1

xi =14.9

9= 1.65555,

9∑

i=1

x2i = 25.97.

Exactly

s2 =1

8

(

25.97 − 9

(

14.9

9

)2)

= 0.1627777 . . . .

If you use x = 1.65556, you get s2 = 0.1627612, with a relative percentage error

100 × (0.1627612 − 0.1627778)

0.1627778= −0.0102%.

The table below shows what happens when using different values for x.

x s2 Relative percentage error

14.9/9 0.1627778 0.0000 %

1.655556 0.1627751 −0.0010 %

1.65556 0.1627612 −0.0102 %

1.6556 0.1626122 −0.1017 %

1.656 0.1611220 −1.0172 %

1.66 0.1452000 −10.1843 %

1.7 −0.0050000 −103.0717 %

Even in this simple case the message is clear: be very precise in your value for x!


I was just wondering if you could tell me what sampling without replacement and testing

homogeneity meant?

Response: Sampling without replacement: Imagine a population of N objects and you

want to sample n of them. Pick one population member. This leaves N − 1 to choose from

3

for the second sample member. Having chosen the second sample member you now have

N − 2 to choose from for the second sample member. And so on.

Sampling with replacement: Imagine a population of N objects and you want to sample

n of them. Pick one population member. Look at it. Replace it into the population. You

now have N to choose from for the second sample member. Look at it. Replace it into the

population. You now have N to choose from for the third sample member. And so on.

Testing homogeneity: covered in lecture 20 for a contingency table with fixed marginal

totals. Suppose you have m rows and n columns with fixed row totals. Hypothesis is of the

form H0:π1,j = π2,j = π3,j = . . . = πm,j(= πj) for j = 1, 2, . . . , n. You are testing whether

the proportions in column j are the same for each row. As an example, suppose we have

three customer surveys in which we sample 100 in the North, 120 in the Midlands, and 110

in the South. Customers can like (1) the product, be indifferent (2), or dislike it (3). We

have three surveys with fixed totals so conduct a test of homogeneity. Our hypothesis is

thus H0:πN,j = πM,j = πS,j(= πj) for j = 1, 2, 3.


I am struggling with an aspect of your course, which is finding the 95% confidence intervals.

I do not understand how to get the piece of the formula after the mean ±. I hope this makes

sense. I understand it is to do with the t-tables, but I assumed it was always 1.96?

Response: Recall lecture 4. Here X1, X2, . . . , Xnind∼ N(µ, σ2) with known variance σ2, so

that X ∼ N(µ, σ2/n). Thus

Z =X − µ√

σ2/n∼ N(0, 1).

Since for a standard normal distribution we know that pr{−1.96 < Z < +1.96} = 0.95 this

gives

pr

{

−1.96 <X − µ√

σ2/n< +1.96

}

= 0.95.

This can be re-arranged to give the 95% confidence interval for µ in the form

µ ∈ x ± 1.96√

σ2/n.

Now suppose that the variance σ2 is unknown. Here we estimate σ2 using the sample

variance s2 and we then know that

T =X − µ√

s2/n∼ tn−1.

Since for a tn−1-distribution we know that pr{−tn−1(2.5%) < T < +tn−1(2.5%)} = 0.95 this

gives

pr

{

−tn−1(2.5%) <X − µ√

s2/n< −tn−1(2.5%)

}

= 0.95.

This can be re-arranged to give the 95% confidence interval for µ in the form

µ ∈ x ± tn−1(2.5%)√

s2/n.

4

Notice that in both cases the formula is of the form

point estimate for parameter ± constant × Stdev [point estimate]

where the constant term depends upon the distribution of the point estimate. In the first

case Var[X] = σ2/n and a normal distribution is used. In the second case the estimated

variance is s2/n and this leads to a t-distribution with appropriate degrees of freedom.


I’m a bit confused on hypothesis tests. I am working through the worked example questions

in the back of the exercises handout, and on one question the test statistic is Z = (X−µ)/σ

rather than Z = (X − µ)/(σ/√

n) like all the other ones. Why do you not divide by√

n?

I’m guessing it’s to do with the binomial distribution of the data but I’m not sure.

Response: In all cases the test of hypothesis was based on a test statistic U which satisfied

E[U ] = m and Var[U ] = v2. A test of hypothesis about the parameter m would typically be

based on the test statistic Z =U − m

v. If U ∼ N(m, v2), then Z ∼ N(0, 1).

For example, in lecture 5 we had Xiind∼ N(µ, σ2) for i = 1, 2, . . . , n. Then X ∼ N(µ, σ2/n)

so that Z =X − µ√

σ2/n∼ N(0, 1) and this was used to derive test of hypotheses about µ with

σ2 known. Here U = X, m = µ and v2 = σ2/n.

In lecture 6, with variance σ2 unknown, we estimated σ2 using the sample variance s2.

Our test of hypothesis about µ was based on the result T =X − µ√

S2/n∼ tn−1. Here U = X,

m = µ and v2 = S2/n.

In lecture 10 we wanted to test the slope β of a regression line and used the re-

sult β ∼ N

β,σ2

n∑

i=1

(xi − x)2

so that Z =β − β

√

√

√

√

√

√

σ2

n∑

i=1

(xi − x)2

∼ N(0, 1). In practice we have

to estimate σ2 and use the result T =β − β

√

√

√

√

√

√

σ2

n∑

i=1

(xi − x)2

∼ tn−2. Here U = β, m = β,

v2 = σ2

/

n∑

i=1

(xi − x)2.

Similar comments can be made about the tests of hypothesis developed in lectures 15,

16 and 17.

In the worked examples binomial example we observed the number of sizes in 100 throws

of a die and wanted to test if the probability of a six occurring was 1/6. Let X be number

of sixes in n = 100 throws, so X ∼ Bin(n = 100, θ = 1/6) if H0 true. In this case

5

X ≈ N(nθ, nθ(1 − θ)). The test statistic uses Z =X − nθ

√

nθ(1 − θ)≈ N(0, 1). In this form for

the test we have U = X, m = nθ and v2 = nθ(1 − θ).

Suppose in this latter example we wanted to work with proportions X = X/n; then if H0

true we have X ≈ N(θ, θ(1−θ)/n). The test statistic then uses Z =X − θ

√

θ(1 − θ)/n≈ N(0, 1).

In this form for the test we have U = X, m = θ and v2 = θ(1 − θ)/n.

Recall the Bernoulli trial model of MATH1715. If X1, X2, . . . , Xn are independent

Bernoulli random variables taking values 0 or 1 with probability (1− θ) and θ respectively,

then mean µ = E[Xi] = θ, E[X2i ] = θ, variance σ2 = Var[Xi] = θ(1− θ). Their sum satisfies

X = X1 + X2 + · · · + Xn ∼ Bin(n, θ) ≈ N(nθ, nσ2 = nθ(1 − θ)) and their mean X = X/n

satisfies X ≈ N(θ, σ2/n = θ(1 − θ)/n) and the test statistic satisfies =X − θ√

σ2/n≈ N(0, 1).


I’m reading over my notes and in lecture 4 (confidence intervals for normal distribution), it

says that for a 95% confidence interval you always use the value 1.96. Is there any way of

working this out or is it just something that you have to remember? I ask because what if

I had to construct a 90% confidence interval. How do you work out which value to use?

Response: The lecture uses the fact that if X1, X2, . . . , Xnind∼ N(µ, σ2), then X ∼ N(µ, σ2/n).

Thus

Z =X − µ√

σ2/n∼ N(0, 1)

for which pr{−1.96 < Z < 1.96} = 0.95 and so giving

pr

{

−1.96 <X − µ√

σ2/n< +1.96

}

= 0.95.

This can be re-arranged to give the 95% confidence interval for µ.

For a 90% confidence interval, you would repeat the lecture but note that

pr{−1.645 < Z < 1.645} = 0.90.


I noticed in the answers to Q2 in Exercises I you used coding z = (x−m)/c for the sample

mean and variance. I would like to know why you did this and what are m and c?

Response: Because the mid points x1 were complicated values 2.95, 3.35, 3.75, 4.15 and

so on. Coding makes the numbers simpler to use, so less difficult computationally, and less

likely to make a mistake!

Choose m as class mid point near the centre of the data. Here the class with greatest

frequency was the 4.0-4.3 class with mid-point 4.15, so I chose m = 4.15. Choose c as the

class width, here c = 0.4.

6

The coding z = (x−4.15)/0.4 gives coded values −3, −2, −1 and so on. It is a lot easier

working out, for example,

∑

i

fiz2i = 19(−3)2 + 67(−2)2 + 141(−1)2 + · · ·

than working out

∑

i

fix2i = 19(2.95)2 + 67(3.35)2 + 141(3.75)2 + · · · .


I just have a quick question about hypothesis testing. I don’t quite understand it and what

it is used for. Could you just briefly explain because I don’t understand why one would

want to test out different µ values if one already knows the mean for that particular set of

data.

Response: Knowing the sample mean x is not the same as knowing the population mean

µ.

Suppose you want to test whether a new drug has any effect on blood pressure. You

take 10 people and measure their blood pressure. Then you give them the drug and measure

their blood pressure again. Suppose the differences, “before − after”, are:

−0.40 0.35 1.28 − 0.66 − 0.57 − 1.27 0.48 0.69 1.74 0.16.

The sample mean is x = 0.18 and the sample variance is s2 = 0.859.

If the drug really has no effect, then the distribution of all possible observed differences

ought to have a mean µ = 0. If the drug does have an effect, then µ 6= 0. All you have is

the sample mean 0.18.

It looks like the sample mean is close to 0, but could it really be significantly different

from 0? You want to test the hypothesis H0: µ = 0 against the alternative H1: µ 6= 0. If

H0 is true, then the sample mean x can be modelled as coming from a normal distribution

with mean 0 and variance σ2/n where σ2 is the population variance for the differences, and

where n = 10 values. Thus

Z =X

√

σ2/n∼ N(0, 1).

Since you do not know σ2, you estimate it using the sample variance s2 = 0.859. Then

T =X

√

S2/n∼ tn−1.

Here you have observed t =0.18

√

0.859/10= 0.18/0.293 = 0.614. This is close to 0, so you

accept H0.

But suppose your sample mean was x = 0.67. This looks close to 0 too, but is it really

close to 0? Here, if s2 = 0.859 still, then t =0.67

√

0.859/10= 0.67/0.293 = 2.287 which is

quite a long way in the tail of the t9 distribution. You would thus reject H0.

7


I know this is a bit late, but I’m getting confused on how you find the expected values

for a χ2-test when you’re only given one data set and not a table. Is it that you find the

probability of it being a certain value and then multiplying that via the number in the

sample?

Response: The answer to your question is yes. If a null hypothesis H0 is true, then given

a probability π of an observation being in a particular cell, then with n observations in

total we would assume nπ observations in that cell. Recall the binomial distribution, if

X Bin(n, π), then E[X] = nπ.

Thus consider the cat example: we test if cats choose at random, so a probability π = 0.5

for each category. With n = 10 cats in total, we expect to see 0.5n = 5 cats in each category.

Similarly for goodness-of-fit testing. We have n observations in total. Assuming some

null hypothesis H0 is true, determine the probabilities πi with which an observation is in

category i. If H0 is true, we then expect to see nπi observations in category i.


Just a question about which degree of freedom to use with the t-distribution. In some

examples such as testing the slope β for the regression line we’ve used tn−2 whereas we’ve

used tn−1 in other examples. I’m just wondering how to distinguish which is to be used at

certain points and why?

Response: Crudely you can think of the degrees of freedom as equal to

N − p = Number of observations − Number of parameters estimated for the mean

when we are estimating the variance.

Example 1: in testing the mean µ = 0 we estimate the variance σ2 using

s2 =1

n − 1

n∑

i=1

(xi − x)2.

Here N = n and p = 1 in that we estimate µ by x when estimating σ2.

Example 2: In testing the slope β of a regression line we estimate the variability about the

line using

σ2 =1

n − 2

n∑

i=1

(yi − α − βxi)2.

Here N = n and p = 2 as we estimate α and β using α and β.

Example 3: In testing whether two independent samples have equal means we estimate the

common variability σ2 using

s2 =(n1 − 1)s2

1 + (n2 − 1)s22

n1 + n2 − 2=

1

n1 + n2 − 2

(

n1∑

i=1

(x1i − x1)2 +

n2∑

i=1

(x2i − x2)2

)

.

Here N = n1 + n2 and p = 2 as we estimate µ1 by x1 and µ2 by x2.

8

A more “proper” explanation (which you do not need to know at this level) is as follows:

if you have a random variable Z ∼ N(0, 1) and a random variable U ∼ χ2n−k and Z and U

are independent, then it turns out that

T =Z

√

U

n − k

∼ tn−k.

As an example: if X ∼ N(µ, σ2/n), then you know that

Z =X − µ√

σ2/n∼ N(0, 1).

It can be shown that

U =(n − 1)S2

σ2∼ χ2

n−1

and also Z and U are independent. Thus

T =Z

√

U

n − 1

=

X − µ√

σ2/n√

S2

σ2

=X − µ√

S2/n∼ tn−1.


Do we need to know about linear interpolation for the exam as I noticed a question on

lecture 3 in the homework booklet requires linear interpolation yet to my recollection you

have never mentioned this in the lectures.

Response: I assumed people would know how to do this!

If I tell you that y = y1 at x = x1 and y = y2 at x = x2, then, assuming a straight

line between (x1, y1) and (x2, y2), can you tell me the estimated y value at x = x3 where

x1 < x3 < x2?

For example, if y = 2.1 at x = 1.1 and y = 3.2 at x = 1.2, then, assuming a straight line

between (1.1, 2.1) and (1.2, 3.2), can you tell me the estimated y value at x = 1.15?

That’s easy!1 But what about x = 1.11?

When we estimated the median M from a cumulative frequency polygon in lecture 2,

we used (linear) interpolation.

However I would in general try to set examination questions which do not require inter-

polation to be used.


I am having trouble identifying when to use application of match pairs and two independent

samples. I was just wondering if you could please tell me a way of distinguishing between

the two?

1 1

2(2.1 + 3.2)

9

Response: You need to consider each case on its merits. Consider the following examples:

a) You measure the height of 10 students in the morning and in the evening.2

b) You measure the weight of 10 male students and 10 female students chosen at random.3

c) You measure the heights of 10 male students and their sisters.4

d) You measure the heights of 10 husbands and their wives.5

e) You are told the heights of 10 husbands and the heights of 10 wives but not which husband

corresponds to which wife.6

f) You measure the height for 10 students on the front row of a lecture and 10 students sat

at the back of a lecture.7

g) You number the students sat on the front row 1, 2, 3, . . . , 10 and number the students

sat on the back row 1, 2, 3, . . . , 10 and measure their height.8

h) You measure the height of 10 husbands and the height of the 10 judges who hear their

divorce proceedings.9

i) You measure the weekly output of two different factories producing goods in the same 10

weeks.10

j) You measure the weekly output of two different factories producing goods in 10 different

weeks.11


I have a couple of questions about least squares regression which I am sure are really obvious

and trivial. When minimising S the sum of squared deviations you differentiate with respect

to α and β. When equating the differentials to zero, α and β change to α and β. Why do

they change to α and β at that stage? Also, when you differentiate you end up with the

“the sum of α from k = 1 to n”, and the next step is that this equals α. I don’t understand

why you multiply by n because isn’t α a constant?

2Same student in each case so matched pairs. The heights are different! People shrink during the day

due to gravity, but the spine stretches out during the night!3No evidence here that the data are paired, so two independent samples.4The heights of the sisters might be related in some way to that of their brothers, so matched pairs.5At first glance you might think that the height of husbands and wives are not related and so you have

here two independent samples. However, it is conceivable that men and women subconsciously select a mate

on the basis of similarity of height! If so, then analyse as though matched paired data! There is here a

potential correlation between the husband height and the wife height so analyse as matched pairs.6You do not know the pairings here so analyse this as two independent samples.7There is no pairing suggested here so analyse as two independent samples.8The numberings are essentially random so the pairing is illusionary. Analyse these data as two inde-

pendent samples.9There is a pairing here, but no rational reason to suggest that court officials assign judges on the basis

of height. I think the pairing is random and would analyse the data as two independent samples.10The factory outputs may be affected by some common factors in the different weeks. There is a potential

for some correlation to exist; for example, the weather in week 1 was bad but good in week 2. Analyse as

matched pairs.11There is nothing to link the output in different weeks so analyse the data as two independent samples.

10

Response: To obtain the least squares estimates you want to minimise

S =

n∑

k=1

(yk − α − βxk)2

with respect to α and β. Thus you differentiate with respect to α and β and obtain

∂S

∂α= −2

n∑

k=1

(yk − α − βxk),∂S

∂β= −2

n∑

k=1

xk(yk − α − βxk).

Here α and β can be take value. These equations just tell us about the rate of change of S

with respect to α and β at any value of α and β.

However, once we equate these to zero we are finding the precise values of α and β which

minimise S. To distinguish these particular α and β values from any other value we add

the “hat”. Thus

∂S

∂α= 0 ⇒ −2

n∑

k=1

(yk − α− βxk) = 0 and∂S

∂β= 0 ⇒ −2

n∑

k=1

xk(yk − α− βxk) = 0.

Now consider the first of these equations,

n∑

k=1

(yk − α − βxk) = 0 which givesn∑

k=1

yk −n∑

k=1

α −n∑

k=1

βxk = 0.

The second term on the right is just adding up the constant term α a total of n times and

so equals nα. In the third term we can take the β term outside the summation because it

is a constant. Thus

n∑

k=1

yk −n∑

k=1

α −n∑

k=1

βxk = 0 ⇒n∑

k=1

yk − nα − β

n∑

k=1

xk = 0.

Notice that although α and β are both constants we do not take both of them outside the

summation as we would then have

n∑

k=1

yk −n∑

k=1

α −n∑

k=1

βxk = 0 ⇒n∑

k=1

yk − αn∑

k=1

1 − βn∑

k=1

xk = 0

which is correct, the middle term on the right sums “1” n times and so equals n, but it

looks ugly. Mathematics should be aesthetically pleasing!


I don’t understand in the chi-squared test when to use the continuity correction in the

formula.

Response: Use the continuity correction for a χ2-test if you have one degree of freedom (1

df).

11


I don’t know how to differentiate

S =n∑

k=1


for the least squares method.

Response: Recall what you know about differentiation. Thus

∂

∂α

{

n∑

k=1


}

=n∑

k=1

∂

∂α

{

(yk − α − βxk)2}

.

Also∂

∂α

{

(yk − α − βxk)2}

= 2(yk − α − βxk)∂

∂α{(yk − α − βxk)}

using the “chain-rule”.

Similarly you can differentiate with respect to β.

12

2013 Examination paper

2013 exam, question A14: 10/5/2015

Can I answer this question as “when P -value is large, then the null hypothesis H0 is ac-

cepted”?

Response: Yes.

2013 exam, question A15: 18/5/2015

I kindly request you to help me with June 2013 A15 MATH1725 please?

Response: Look at the handout for lecture 10 which introduced residuals.

2013 exam, question B1(d): 18/5/2015

In question B1(d), would the number of degrees of freedom increase as we are no longer

estimating the mean? And does the fact that s2 > x in part (a) support the conclusion that

the Poisson distribution provides a good fit?

Response: Yes, the degrees of freedom would increase if we did not have to estimate the

mean µ. With four groups (0, 1, 2, ≥ 3, so all expected frequencies are at least five), the

degrees of freedom are:

k = 4 − 1 − 1 = 2 if we test whether the data is fitted by a Poisson distribution with mean

µ which has to be estimated,

k = 4 − 1 − 0 = 3 if we test whether the data is fitted by a Poisson distribution with mean

µ = 1.

In part (a) I obtain x = 0.70 and s2 = 0.763. I think these are quite close, so x ≈ s2,

as we might expect for a Poisson distribution. The ratio x/s2 = 0.918. I think this is quite

close to unity.

The R code below shows the result of 1000 simulations taking 280 values from a Poisson(µ =

0.7) distribution and calculating the ratio x/s2.

ratio=numeric(1000) # Initialise 1000 values of ratio as zero.

for (k in 1:1000){

x=rpois(280,0.7) # Sample 280 values from Poisson(0.7) distribution.

ratio[k]=mean(x)/var(x) # mean/variance ratio stored in ratio[k]

} # end for loop.

hist(ratio,100,main="mean/variance ratio") # Histogram with about 100 bars.

points(0,0.918,pch=20,cex=2) # Show location where ratio=0.918.

You can see that the observed ratio here is within the bounds of possibility.

2013 exam, question B3(e): 18/5/2015

For question B3(e) are they two independent samples so that the confidence interval is

µx − µy = x − y ± t

√

s2x

n+

s2y

n?

13

Response: This appeared on the handout to the example in lecture 15.

With two independent samples, assuming a common variance σ2, you have the result

x − y ∼ N

(

µx − µy, σ2

(

1

nx

+1

ny

))

so thatx − y − (µx − µy)√

σ2

(

1

nx

+1

ny

)

∼ N(0, 1).

Since you estimate σ2 using the pooled sample variance s2 you have the result

x − y − (µx − µy)√

s2

(

1

nx

+1

ny

)

∼ tnx+ny−2.

This leads to the confidence interval of the form

µx − µy = x − y ± tnx+ny−2(2.5%)

√

s2

(

1

nx

+1

ny

)

.

14


2012 exam, question A8: 17/5/2015

I’m really struggling on question A8 from the June 2012 paper. I’m probably just missing

something but I don’t know how to find the covariance between X + 2Y and X − Y and

can’t get the answer of −12 so could you please explain to me how to do it?

Response: You know that (x + 2y)(x− y) = x2 + xy − 2y2 so that cov(X + 2Y, X − Y ) =

Var[X] + cov(X, Y ) − 2Var[Y ]. More formally you know that

cov

(

2∑

i=1

aiXi,2∑

j=1

bjXj

)

=2∑

i=1

2∑

j=1

aibjcov(Xi, Xj)

and put a1 = 1, a2 = 2, b1 = 1, b2 = −2n X = X1 and Y = X2.

You are told that Var[X] = 4, Var[Y ] = 9 and corr(X, Y ) = 1

3so that cov(X, Y ) =

corr(X, Y )√

Var[X]Var[Y ] = 2. Hence

cov(X + 2Y, X − Y ) = Var[X] + cov(X, Y ) − 2Var[Y ] = 4 + 2 − 18 = −12.

2012 exam, question B1(b): 11/5/2013

Could you please help me with question B1(b) on the 2012 exam paper. I can’t work out

how to obtain any of the values.

Response: Suppose that X ∼ N(µ = 48, σ2 = 144). Then

pr{X ≤ 49.5} = pr

{

X − µ

σ≤ 49.5 − 48

12

}

= pr{Z ≤ 0.125} = Φ(0.125)

where Z ∼ N(0, 1) and Φ(z) = pr{Z ≤ z}. From tables Φ(0.10) = 0.5398 and Φ(0.15) =

0.5596 so Φ(0.125) = 0.5497.

For 40-49 class the fitted class probability is

pr{39.5 < X ≤ 49.5} = pr{X ≤ 49.5} − pr{X ≤ 39.5} = 0.5497 − 0.2394 = 0.3103.

The corresponding fitted class frequency is 0.3103n = 31.0 where n = 100 is the sample

size.

The other values are worked out similarly except noting that the upper boundary for

the 70-79 class is +∞ and the lower boundary for the 20-29 class is −∞.

2012 exam, questions B1(b): 4/5/2013

How do you work out the fitted class probabilities? I managed to calculate m4 by pr{x < 29.5}and obtaining 0.0616 but then for m6, I tried pr{x < 49.5} and did not obtain the right

answer?

Response: If X ∼ N(µ = 48, σ2 = 144), then

pr{Value in 20-29 class} = pr{X < 29.5} = Φ

(

29.5 − µ

σ

)

= Φ

(

29.5 − 48

12

)

= 0.0616

15

where Φ(z) = pr{Z < z} for Z =X − µ

σ∼ N(0, 1). Also

pr{Value in 40-49 class} = pr{39.5 < X < 49.5} = Φ

(

49.5 − 48

12

)

− Φ

(

39.5 − 48

12

)

= Φ(0.125) − 0.2394 = 0.5497 − 0.2394

= 0.3103.

Here Φ(0.125) = 1

2(Φ(0.10) + Φ(0.15)) = 1

2(0.5398 + 0.5596) = 0.5497 by interpolation.


On question B1(b) from the June 2012 Math1725 paper, I am not sure how you have got

all the m values to put in the table. Could you please explain this to me.

Response: Suppose X ∼ N(µ = 48, σ2 = 144). Consider the “40-49” class.

Here m2 = pr{X < 49.5} = Φ

(

49.5 − 48

12

)

= Φ(0.125) = 0.5497, where, by interpola-

tion, Φ(0.125) = 1

2(Φ(0.10) + Φ(0.15)) = 1

2(0.5398 + 0.5596) = 0.5497.

Then

m5 = pr{Value in 40-49 class}= pr{39.5 < X < 49.5}= pr{X < 49.5} − pr{X < 39.5}= 0.5497 − 0.2394 = 0.3103.

The expected (or fitted) class frequency is 100 × 0.3103 = 31.0, as given in the table.

The values m1, m4, m6, m8, m9, m10 and m11 are found similarly.

For the last row note that pr{Value in 70-79 class} = pr{X > 69.5}, so that the class

upper boundary here is really +∞ and m1 = 1.0000.

Thus m3 = pr{Value in 70-79 class} = pr{X > 69.5} = 1 − pr{X < 69.5} = 1 −0.9634 = 0.0366. You can alternatively think of this as m3 = pr{Value in 70-79 class} =

pr{69.5 < X < +∞} = 1.0000 − 0.9634 = 0.0366.

We make the upper boundary of this last class equal to +∞ and the lower boundary of

the first class we make equal −∞ so that the fitted class probabilities sum to unity, and the

fitted class frequencies sum to 100.

2012 exam, question B3(c): 14/5/2015

I made a table for the difference between two types of fed cows, then used hypothesis testing

H0: µ = 0. But in the online solutions it does something very different. So I got really

confused.

Response: In question B3(c) we are testing whether the mean methane emission of grass

silage fed cows is the same as for concentrate fed cows. There is no suggestion in the question

that there is any pairing of cows. The twelve cows are randomly allocated to the two feed

16

methods. Thus the data analysis proceeds as two independent samples with test statistic

t =x − y

√

s2

(

1

nx

+1

ny

)

where

s2 =(nx − 1)s2

x + (ny − 1)s2y

nx + ny − 2

and where t ∼ tnx+ny−2 if H0: µx = µy is true.

If the cows were matched (so for example each cow received grass silage for a year

and then concentrate for a year say, or the cows were paired up on basis of size or other

characteristic), then we would take differences di = xi − yi and test whether the mean

difference is zero using

t =d√

s2d

n

where t ∼ tn−1 if H0: µd = 0 is true, where n = 6 is the number of pairs. The question does

not suggest that this matching (pairing) is relevant here.

2012 exam, question B4(a,d): 10/5/2015

Would you give answers of (a) and (d)?

Response: For part (a) look at lecture 20, sections 20.1 and 20.2.

For part (d), a proportion p is approximately a normal distribution with mean π and

variance π(1 − π)/n. You can estimate π using p = 125/1000. So what does the variance

approximately equal?

17


2011 exam, questions A8-A10: 14/5/2014

I’m having some trouble with some questions on the May/June 2011 MATH1725 past paper.

In particular:

A8. If a random variable X has a chi-squared distribution with 5 degrees of freedom, what

is the value of x such that pr{X > x} = 0.10?

A9. Suppose that X and Y are independent random variables and have a common binomial

distribution with index n and parameter π. What is the variance of X − Y ?

A10. If X1, X2, . . . , Xn are independent observations from a Bin(n, π) distribution, what is

the variance of their sum, S = X1 + X2 + · · · + Xn?

Response: In A8 you use χ2-tables to look up pr{X > χ25(10%)} = 0.10; see lecture 18.

In A9 you know X and Y are independent so Var[X − Y ] = Var[X] + Var[Y ]. If

X ∼ Bin(n, π), you know from MATH1715 (and lectures 16 and 17) that Var[X] = nπ(1−π).

As Y has the same distribution you know the variance of Y so you can deduce Var[X − Y ].

In A10 if Var[Xi] = σ2 for all i, you know from lecture 14 that Var[X1 +X2 + · · ·+Xn] =

nσ2 as the Xi are mutually independent. Since Xi ∼ Bin(n, π) then Var[Xi] = nπ(1 − π)

and you can deduce the variance of S.

2011 exam, question B1(c-iii): 11/5/2012

Would it be possible to get solutions from 2011 past paper on how to find the test statistic

for B1 (c-iii).

Response: To test H0: β = 0 see lecture 10. The test statistic is

t =β

√

√

√

√

√

√

σ2

n∑

i=1

(xi − x)2

where t ∼ tn−1 if H0 is true, and σ2 is given in the question.

2011 exam, question B2(b): 11/5/2013

Do we use the normal or t-distribution here? I have used the normal distribution as the

standard deviation was given.

Response: The question states “he takes a random sample of 26 such cases from court

files in late 2009 and finds a mean sentence length of 6.87 months, with standard deviation

2.2 months”. The implication is surely that the sample mean is x = 6.87 months and the

sample standard deviation is s = 2.2 months. You need to use the t-distribution with 25

degrees of freedom.

18

2011 exam, question B2(e): 16/5/2015

In the revision answers you have said that sy = 1.1 months. Should it not be 1.21 months

as 2.22 × 0.52 = 1.21?

Response: Standard deviation of sentence length X is sx = 2.2. Variance of sentence

length is s2x = 2.22.

In part (d) we are told to suppose that people spend 50% of their sentence in prison.

If Y is the time spent in prison and X is the sentence length, then Y = 0.50X. Thus

Var[Y ] = Var[0.50X] = 0.52Var[X] = 1.21. Thus Stdev(Y ) =√

1.21 = 1.1.

2011 exam, question B2(e): 11/5/2013

Do we use the normal or t-distribution here? I have used the normal distribution as the

standard deviation was given.

Response: The question states “he takes a random sample of 26 such cases from court

files in late 2009 and finds a mean sentence length of 6.87 months, with standard deviation

2.2 months”. The time in prison y is 50% of the sentence length x so in this case we have

sample mean y = 3.435 months and the sample standard deviation is sy = 1.1 months. You

again need to use the t-distribution with 25 degrees of freedom.

2011 exam, question B2(e): 11/5/2012

Would it be possible to get solutions from 2011 past paper on how to find the test statistic

for B2 (e).

Response: We test H0: µ = 5 against H1: µ < 5 using a t-statistic as in lecture 5.

2011 exam, question B3(c): 21/5/2014

I am not sure about the answer in B3(c).

Response: In part (a) you have to show that

σ2D = Var[D] = Var[X − Y ] = σ2

X − 2ρσXσY + σ2Y .

Clearly large ρ values make this variance small. (Compare what happens if ρ = 1 and if

ρ = 0.)

In part (b) you show that if µD = 0, then

D ∼ N

(

0,σ2

D

n

)

.

If we know that D has very small spread about zero, then it will be easy to decide whether

our observed D is consistent with coming from this distribution.

You could also think about the confidence interval for µD. A 95% confidence interval

for µD (if the variance σ2D were known say) is

D ± 1.96σD√

n.

If σ2D is small, then our confidence interval for µD is small. We can be confident where the

true value of µD is located.

19


What is the correct method in this case for working out expected frequencies? Do I work

out the expected values for “Yes” in both groups and then subtract this from 2500 for the

“No” expected? This would lead to two estimated parameters right?

Response: Here you have fixed row marginal totals so this is a test of homogeneity; see

lecture 20. You would want to test a null hypothesis

H0 : πcanceryes= πcontrolyes

, πcancerno= πcontrolno

.

Thus, is the proportion exposed to pesticide the same in the two groups and is the proportion

not exposed to pesticide the same in the two groups? If the proportions are the same, then

they would be equal to a common value πyes and πno respectively. Since πyes + πno = 1, you

only need estimate πyes. You thus have four cells (groups), one estimated parameter, and

two constraints (your row expected frequencies must both sum to 2500). You thus have one

degree of freedom.

The four expected frequencies are determined as

49×2500

5000= 24.5 4951×2500

5000= 2475.5

49×2500

5000= 24.5 4951×2500

5000= 2475.5

2011 exam, questions B4(b-ii): 11/5/2012

Would it be possible to get solutions from 2011 past paper on how to work out the standard

deviation for the confidence interval in B4 (b-ii).

Response: See lecture 17 for approximate confidence intervals for proportions. It is of the

form p ± 1.96

√

p(1 − p)

n.

2011 exam, questions B4(b-iii,iv): 10/5/2015

Would you give answers of (iii) and (iv)?

Response: See the last part of lecture 20; looking at contingency tables with fixed margins.

20


2010 exam, question A8: 29/4/2013

I’m currently doing June 2010’s exam paper for Math 1725, and am stuck on question A8.

Any help would be appreciated.

Response: You are told that Var[X + Y ] = 96, Var[X − Y ] = 64, and that Var[X] and

Var[Y ] are equal. Notice that

Var[X + Y ] = Var[X] + 2cov(X, Y ) + Var[Y ] = 96

and

Var[X − Y ] = Var[X] − 2cov(X, Y ) + Var[Y ] = 64.

Let Var[X] = Var[Y ] = σ2. Also you know that cov(X, Y ) = ρ√

Var[X]Var[Y ] where

ρ = corr(X, Y ). You thus have two equations, namely

2σ2 + 2ρσ2 = 96, 2σ2 − 2ρσ2 = 64.

Solving these equations gives σ2 and ρ.

2010 exam, question A9: 11/5/2013

Please could you explain how to work out question A9 of the 2010 paper?

Response: Recall from lecture 4 and the definition of the t-distribution12 that if T ∼ tν ,

then pr{T > tν(100P%)} = P .

In the question we are told that X ∼ t10 with pr{X ≤ x} = 0.95. Thus pr{X > x} =

1−pr{X ≤ x} = 0.05. We therefore want the value x = t10(5%). From t-tables13, t10(5%) =

1.812. The answer is B.

2010 exam, questions A14-A16: 9/5/2012

I’m having trouble with questions A14-A16 because they’re all sort of related. I don’t

understand the answer given in the solutions for it, I was wondering if you could give a brief

description saying why the answers are the right answer please.

Response: A14 is about the definition of a joint probability density function fXY (x, y) for

variables (X, Y ); done in lecture 11. A15 asks how you would obtain the marginal probability

density function fX(x) of a random variable X given the joint probability density function

of (X, Y ); again done in lecture 11. A16 asks how you would obtain E[XY ] given fXY (x, y);

this was covered in lecture 12.

12See http://www1.maths.leeds.ac.uk/∼sta6ajb/math1725/1725topics.pdf13See page 2 of http://www1.maths.leeds.ac.uk/∼sta6ajb/math1725/tables.pdf

21

2010 exam, question A17: 9/5/2015

Would you explain the answer of question A17 in 2010?

Response: Random variables X and Y independent implies that corr(X, Y ) = 0.

However the converse is not always true. Thus having corr(X, Y ) = 0 does NOT imply

that X and Y are independent. See for example question 4 on exercises IV.

However it can be shown that if X and Y have a bivariate normal distribution with

corr(X, Y ) = 0, then X and Y ARE independent.

2010 exam, question A17: 10/5/2015

My question should be why if X and Y both have normal distributions does this make

uncorrelated X and Y become independent?

Response: Essentially because if X and Y have a bivariate normal distribution with

corr(X, Y ) = 0, then it can be shown that the joint probability density function of (X, Y )

equals the product of the marginal probability density functions, thus fXY (x, y) = fX(x)fY (y).

See lectures 18-20 for MATH2715!

2010 exam, question B1(a,b): 14/5/2015

I was wondering if you could help me a little bit with question B1 on 2010 statistics paper.

I can see that throughout the question the regression line is calculated in a slightly different

way to how it normally is; the mean for y and x don’t seem to be involved in it and instead

of using sxy/s2x for the slope it used a different sum. Please could you explain why this is?

Response: In lecture 9 we fitted a line of the form y = α+βx to data (xi, yi), i = 1, 2, . . . , n,

by minimising S =n∑

i=1

(yi − α − βxi)2. The fitted slope is here

β =sxy

s2x

=

n∑

i=1

(xi − x)(yi − y)

n∑

i=1

(xi − x)2

.

In this question you are told to fit a line of the form y = βx (a straight line with intercept

at zero) by minimising S =n∑

i=1

(yi − βxi)2. See also exercises IV, questions 1, 2. Why should

the fitted slope for this model be the same as for the previous model? The fitted slope for

this model can be shown to be

β =

n∑

i=1

xiyi

n∑

i=1

x2i

=

n∑

i=1

(xi − 0)(yi − 0)

n∑

i=1

(xi − 0)2

.

Notice this line passes through (0, 0) while the above line passes through (x, y).

22

You can imagine that fitting a curve y = α + βx + γx2 is found by least squares by

minimising S =n∑

i=1

(yi − α − βxi − γx2i )

2, and does not give β = sxy/s2x!

2010 exam, question B1(d): 9/5/2015

Would you give an answer for B1(d)?

Response: No. The question asks YOU to draw a scatter plot (surely you can do that!)

and add the fitted line. Once you have done that, cannot you determine whether the fitted

line gives a good fit to the data?

2010 exam, question B2: 9/5/2015

Would you give an answer for B2(b) and B2(d)?

Response: Meaning of “95% confidence interval” was given in lecture 4. Meaning of “5%

significance level” was covered in lectures 5 and 6; the significance level is the same as the

probability of type I error. Look in your notes.

2010 exam, question B3(e): 7/4/2011

I know I need to treat this as two independent variables. But the question does not give

the true variance of males and females or give the information that variances of male and

female are equal.

I am confused about this. Shall we just automatically treat this kind of problem as

variances are equal?

Response: Yes. Either you will be told the variances, so they are known, or you need to

make the assumption that they are equal and conduct the two sample test on that basis. I

will NOT ask you to test equality of means with unknown and unequal variances.

2010 exam, question B4(c,d): 13/5/2012

Could you confirm the hypothesis we should test and the justification?

Response: Does B4 hint that 45 females were chosen at random and 46 males chosen at

random? If you answer yes to this question, then you do a test of homogeneity (proportions

starting at front are the same for males and females).

Or does B4 suggest 91 students were chosen at random (and we happen to have 45 females

and 46 males)? If you answer yes to this question, then you do a test of independence (the

place you start reading and your sex are independent).

The test statistic is the same in each case. The only thing that changes is the test

hypothesis.

23


2009 exam, question A8: 23/4/2011

This asks for the variance of X − 2Y . How do we know whether to include the covariance

term?

Response: Always Var[aX + bY ] = a2Var[X] + b2Var[Y ] + 2ab cov(X, Y ).

You can ignore the covariance term if you know corr(X, Y ) = 0. The question may tell

you that X and Y are uncorrelated so you know corr(X, Y ) = 0, or the question may tell

you that X and Y are independent so you again know that corr(X, Y ) = 0. In this question,

of course, corr(X, Y ) 6= 0!

2009 exam, question A9: 12/4/2011

Could you help me with this question from the 2009 exam paper.

Response: You are told that Var[X] = Var[Y ] = 3 and Var[X + Y ] = 8. Since

Var[aX + bY ] = a2Var[X] + 2ab cov(X, Y ) + b2Var[Y ],

on substituting a = b = 1 you can deduce the value of cov(X, Y ).

2009 exam, question A13: 28/4/2011

Please can you help me with this question.

Response: You are given the values of n, x and s2. You are asked whether the sample

mean is significantly different from zero.

You are being asked to test the hypothesis H0: µ = 0 vs. H1: µ 6= 0 with unknown

variance. See lecture 5.

2009 exam, question A14: 28/4/2011

In A14 values xi and yi, i = 1, 2, . . . , n, lie on a horizontal line y = c where c is a constant.

What does the sample covariance sXY equal? Please can you help me with this question.

Response: You are given observational pairs (x1, y1 = c), (x2, y2 = c), . . . , (xn, yn = c).

What does y equal? Now determine sXY using the formula

sXY =1

n − 1

n∑

i=1

(xi − x)(yi − y).

2009 exam, question A19: 13/5/2012

Question A19 is asking about a confidence interval. I dont know how to do this without a

variance, so I’m not sure what to do!

Response: You are given a proportion p. Here p = 21/161. Recall lecture 17 where an

approximate 95% confidence interval for p was derived. Recall that

p ≈ N

(

π,π(1 − π)

n

)

.

24

Thus tests on π and confidence intervals for π can be based on the fact that

p − π√

π(1 − π)

n

≈ N(0, 1).

Approximate tests and confidence intervals can be obtained using the approximation

p − π√

p(1 − p)

n

≈ N(0, 1).

Notice how the variance is now “known”. The required approximate 95% confidence interval

is thus

p ± 1.96

√

p(1 − p)

n.

2009 exam, question A20: 21/5/2014

What would extremely small values of χ2obs suggest about the experimental data?

Response: We usually do a one-sided χ2-test, checking to see if the observed frequencies

are too different from the expected frequencies. But if the observed and expected frequencies

are too similar, then χ2obs is very small. This is unlikely to be true if the null hypothesis

H0 is true. One explanation is that the observed data were made up by the experimenter –

someone cheated!

2009 exam, question B1(b): 23/5/2010

The question asks me to calculate expected frequencies using the normal distribution and I

can’t figure out how to do this.

Response: Recall lecture 19.

You have already found that x = 71.50′′ and s2 = 7.364. If X represents the height of a

male student, then we can suppose that X ∼ N(µ = 71.5, σ2 = 7.364). Then the expected

probability a male student lies in the “69 to 71” class is, remembering the class boundaries,

pr{68.5 < X ≤ 71.5} = pr

{

68.5 − µ

σ< Z ≤ 71.5 − µ

σ

}

= pr{−1.106 < Z ≤ 0}= pr{Z ≤ 0} − pr{Z ≤ −1.106} = 0.5000 − 0.1345 = 0.3655.

The expected frequency in the “69 to 71” class is thus 100 × 0.3655 = 36.55.

Similarly for the next class. For the “75 to 77” class, make this class open-ended so the

probability a male student lies in this class is pr{X > 74.5}. By doing this we ensure that

the sum of the class probabilities adds to one.

25

2009 exam, question B3(b): 16/5/2015

For question B3 on the 2009 paper, is it asking for another matched pairs test to also be

computed for part (b), before going onto conclude that a matched pairs test is unsuitable

for this data set?

Response: For (b) clearly a matched pairs test is unsuitable. (The data values cannot be

paired – the cuckoos are not laying eggs in both types of test.) It therefore makes no sense

to test the data for differences using a matched pair test if you realise that a matched pair

test is unsuitable. So use a matched pair test in part (a) and a two independent samples

test in part (b).

2009 exam, questions B4(b,c,d): 17/5/2015

Can you please help me with 2009 MATH1725 B4 b,c,d.

Response: If you roll a fair die n times, then the distribution of the number X1 of ones will

follow a Bin(n, π = 1

6) distribution. (Think of tossing a fair coin n times with probability

of heads π.)

If X1 ∼ Bin(n, π), then E[X1] = nπ and Var[X1] = nπ(1 − π).

Y = X1 + X2 is the number of ones or twos observed. Clearly Y ∼ Bin(n, π = 1

3); you

have again n Bernoulli trials (rolls of the die) and a one or a two occurs with probability 1

3.

Thus Var[Y ] = Var[X1 + X2] = n × 1

3× (1 − 1

3) = 2

9n. But you also know that

Var[X1 + X2] = Var[X1] + 2cov(X1, X2) + Var[X2].

You can thus deduce the value of cov(X1, X2) and hence corr(X1, X2).

26