Research Master Diplomas

8/7/2019 Research Master Diplomas

1/6

1 Introduction

To investigate the further development of the numbers of Masters diplomas, weneed to have a model that fits with the data we have already. We will first lookat the Linear trend model, then we will see if the Loglinear model will do better.Effects of seasons also gives a contribution to the prediction. Smoothing will alsobe included.

First we introduce a few notations:

a is the basic level quantity. The starting quantity.

b is the trend. The added (or extracted) quantity per time-unit.

Fis the seasonal variation.

is the random variation, it is assumed to be independent and identicallydistributed, with mean zero.

Some remarks on the used notation: Capital letters is used to express uncertainquantities. Any underlined variable indicates an estimate and forecast of these un-certain quantity. Variables written in small capital show we are dealing with real-izations of the uncertain quantities.

2 Linear trend model

2.1 The theory about Linear trend model

The linear model has a general formula:

Dt = a + b t + t

Ifa en b are estimated, then we get:

Dt = a + b t

If we look at the sum sqared error we get:

S=n

t=1(dt a b t)2

After some calculus we can say that the estimates a and b which minimize Sare:

a =1

n

nt=1

dt 1

2(n + 1) b

b =

nt=1(t dt) (n + 1)

nt=1

12

dtnt=1 t

2 1n

(n

t=1 t)2

1


2/6

2.2 Application of the theory

To apply this theory on our specific problem, we need to have data available, seeappendix. First we calculate b:

41t=1

(t dt) = 1 5 + 2 30 + 3 45 + . . . + 40 28 + 41 6 = 28582

(n + 1) 41t=1

1

2dt = 21 (5 + 30 + 45 + . . . + 28 + 6) = 21 1511 = 31731

41t=1

(t2) =1

6n (n + 1) (2n + 1) = 23821

1n 41t=1

t2

= 141 ( 12 41 42)2 = 18081

Filling in the values gives us b = 0.5486. We now have minimized the error, thusthe sum squared error, given by:

S=41t=1

(dt 48.3744 + 0.5486 t)2 = 68057.3

To calculate a isnt that hard:

a =

1

41

41t=1 d

t

1

2 (42) 0.5486 = 48.3744

We now have a model to forecast the diplomas:

Dt = 48.3744 0.5486 t

To measure the error of this model, we use the Mean Absolute Deviation (MAD)defined as:

1

n

nt=1

|dt Dt|

For our situation:

MAD = 141

41t=1

|dt 48.3744 + 0.5486 t| = 18.391

3 Loglinear trend model

3.1 Theory of Loglinear model

The Loglinear model is almost the same as the Linear one, as for one importantdifference. We replace Dt and write ln(Dt). This gives us:

ln(Dt) = a + b t + Dt = ea (eb)t et

2


3/6

To use the same trick to determine a and b we take the ln of the data, see appendix.

41t=1

(t ln(dt)) = 2933.07

(n + 1) 41t=1

1

2ln(dt) = 307.45

41t=1

(t2) = 23821

1

n

41t=1

t

2= 18081

Fill in to get b = 0, 0130.

Now the same for a: a = 3, 7651.

Our Loglinear model is now given by:

Dt = e3.7651 e0.0130 t

To see which model fits better to the provided data, we calculate the MAD of ourLoglinear model:

MAD = 141

41t=1

|dt e3.7651 e0.0130 t| = 10.982

Apparently the Loglinear model beats the Linear model in accuracy, for this par-ticular case.

4 Smoothing and Seasoning

4.1 Smoothing

For a short term forcasting, it it usefull to use smoothing. Smoothing adds moreweight to recent data, so that old data wont be important. In our model we useexponential smoothing; the weight of the older data decreases exponential. Thisalso effects the sum squared error, the new error is given by:

S =

j=1

j(dtj a b (t j))2

We call the discount factor, it has the following property: 0 < < 1

Recall the Linear model:Dt = a + b t + t

3


4/6

Brown proved not so long ago (1963) that the following expression for at and bt will

minimize the weighted sum squared error:at =

1 (1 )2

dt + (1 )

2 (at1 + bt1)

bt =2

1 (1 )2(at at1) +

1

2

1 (1 )2

bt1

The new symbol is defines as: = 1 . This is to be estimated. We chose = 0.05 (thus = 0.95).

So we see at and bt are a weighted mix ofdt and the estimate at time t1 for timet.

To apply this new theory we need to begin with an estimate a0 and b0. Luckily forus we already have estimates that minimize the error at t = 1. See section 2.1 to

find that a0 = 48.37 and b0 = 0.55.

Filling in these values gives us a0 and b0 at time t = 2:

a1 =

1 (0.95)2 5 + 0.952 (48.37 0.55) = 43.645

b1 =0.052

1 (0.95)2 (43.645 48.37) + (1

0.052

1 (0.95)2) 0.55 = 0.657

And now to use the same formula, but with t = 2 . . . 41 instead: See appendix forthe complete data.

4.2 Seasoning

Sometimes you can encounter a situation in which you get a regulary fluctuationin your data. To model this fluctuation we need another concept, called seasoning.Seasoning is the fluctuation of data that repeats itself every P periods. A linearmodel with a seasonal coefficient could look like this:

Dt = (a + b t) Ft + t

The new update formulas would be:

at = WHdt

FtP+ (1 WH)(at1 bt1)

bt

= WH(at

at

1

) + (1 WH) bt

1

Ft = WHdt

at+ (1 WH) FtP

Pis the number of seasons in a full cycle. WH, WHand WHare smoothingconstants.

We are going to use this new theory, step by step.

To begin we have to know what our Pis. Because we have data per quarteryear, it is logical that we use P= 4. A whole cycle is then equivalent to ayear.

P= 4

4


5/6

Estimate the trend point, using a moving average of one year (P= 4), which

is centered at t 1

2 . To estimate these trend points, calculate

1

2

1

4(dt2 + dt1 + dt + dt+1) +

1

4(dt1 + dt + dt+1 + dt+2)

for 3 t 39

In our situation at t = 3, 4, . . .:

1

2

1

4(5 + 30 + 45 + 63) +

1

4(30 + 45 + 63 + 86)

= 45.875

1

2

1

4(30 + 45 + 63 + 86) +

1

4(45 + 63 + 86

To determine the seasonal factor, we have to divide dt by the trendpoints att. To cut on the fluctuations, we average all

Ftmod P, e.g. all factors with thesame period. Normalize these values so that they add to Pand average to

1. To do this, calculateFt+2

F(t+2)mod Pwith F(t+2) mod Pthe averaged F. In our

concrete situation:

45

45.875= 0.981 and further

63

55.250= 1.140 and so on.

Calculate the estimate F(t+2) mod P:

0.981 + 0.786 + . . . + 0.792

9= 0.815 (1)

1.140 + 0.748 + . . . + 1.271

9

= 0.978 (2)

1, 619 + 1.705 + . . . + 1, 125

9= 1.359 (3)

0.981 + 0.786 + . . . + 1, 045

9= 0.882 (4)

We must deseasonalize the seasoned demand: Divide the data by the normal-ized seasonal estimate. Now we are back to setting up a regular linear model.As for our data:

5

Fnorm1+2 mod 4=

5

1.359= 6.2

30

0.882= 6.2

45

0.815= 6.2

63

0.978= 6.2

The others are found analogue to this, by dividing by the normalized factorthat belongs to the period t is in. Now to repeat section 2 and 3 to find thelinear and loglinear trend:

5


6/6

As for the linear trend we find the following, ift > 41:

Dt,41 = (a1,41 + b1,41 t) Ft = (48.09 0.53 t) Ft

With Ft (0.808; 0.970;1.347; 0.875). We choose Ft so that it matchesthe period t is in. To calculate earlier values of the model, you dont haveto take count of the seasoning effect:

D1 = a + b 1 = 48.09 0.53 1 = 47.6

D2 = 48.09 0.53 2 = 47.0

D3 = 48.09 0.53 3 = 46.5

D4 = 48.09 0.53 4 = 46.0

And so on. . .

Now for the Loglinear model, for t > 41:

ln

Dt,41

= (a1,41 + b1,41 t) Ft = (3.770.013 t) Ft

With Ft (0.808; 0.970;1.347; 0.875). We choose Ft so that it matchesthe period t is in.

For the data before t = 41 we get the same calculation as in 3:

ln(Dt) = a0 + b0 t = 3.77 0.013 t

D1 = e3.770.0131 = 42.735

D2 = e3.770.0132 = 42.178

D3 = e3.770.0133 = 41.629

D4 = e3.770.0134 = 41.078

...

6

Research Master Diplomas

Documents

Transcript of Research Master Diplomas