Research Master Diplomas

download Research Master Diplomas

of 6

Transcript of Research Master Diplomas

  • 8/7/2019 Research Master Diplomas

    1/6

    1 Introduction

    To investigate the further development of the numbers of Masters diplomas, weneed to have a model that fits with the data we have already. We will first lookat the Linear trend model, then we will see if the Loglinear model will do better.Effects of seasons also gives a contribution to the prediction. Smoothing will alsobe included.

    First we introduce a few notations:

    a is the basic level quantity. The starting quantity.

    b is the trend. The added (or extracted) quantity per time-unit.

    Fis the seasonal variation.

    is the random variation, it is assumed to be independent and identicallydistributed, with mean zero.

    Some remarks on the used notation: Capital letters is used to express uncertainquantities. Any underlined variable indicates an estimate and forecast of these un-certain quantity. Variables written in small capital show we are dealing with real-izations of the uncertain quantities.

    2 Linear trend model

    2.1 The theory about Linear trend model

    The linear model has a general formula:

    Dt = a + b t + t

    Ifa en b are estimated, then we get:

    Dt = a + b t

    If we look at the sum sqared error we get:

    S=n

    t=1(dt a b t)2

    After some calculus we can say that the estimates a and b which minimize Sare:

    a =1

    n

    nt=1

    dt 1

    2(n + 1) b

    b =

    nt=1(t dt) (n + 1)

    nt=1

    12

    dtnt=1 t

    2 1n

    (n

    t=1 t)2

    1

  • 8/7/2019 Research Master Diplomas

    2/6

    2.2 Application of the theory

    To apply this theory on our specific problem, we need to have data available, seeappendix. First we calculate b:

    41t=1

    (t dt) = 1 5 + 2 30 + 3 45 + . . . + 40 28 + 41 6 = 28582

    (n + 1) 41t=1

    1

    2dt = 21 (5 + 30 + 45 + . . . + 28 + 6) = 21 1511 = 31731

    41t=1

    (t2) =1

    6n (n + 1) (2n + 1) = 23821

    1n 41t=1

    t2

    = 141 ( 12 41 42)2 = 18081

    Filling in the values gives us b = 0.5486. We now have minimized the error, thusthe sum squared error, given by:

    S=41t=1

    (dt 48.3744 + 0.5486 t)2 = 68057.3

    To calculate a isnt that hard:

    a =

    1

    41

    41t=1 d

    t

    1

    2 (42) 0.5486 = 48.3744

    We now have a model to forecast the diplomas:

    Dt = 48.3744 0.5486 t

    To measure the error of this model, we use the Mean Absolute Deviation (MAD)defined as:

    1

    n

    nt=1

    |dt Dt|

    For our situation:

    MAD = 141

    41t=1

    |dt 48.3744 + 0.5486 t| = 18.391

    3 Loglinear trend model

    3.1 Theory of Loglinear model

    The Loglinear model is almost the same as the Linear one, as for one importantdifference. We replace Dt and write ln(Dt). This gives us:

    ln(Dt) = a + b t + Dt = ea (eb)t et

    2

  • 8/7/2019 Research Master Diplomas

    3/6

    To use the same trick to determine a and b we take the ln of the data, see appendix.

    41t=1

    (t ln(dt)) = 2933.07

    (n + 1) 41t=1

    1

    2ln(dt) = 307.45

    41t=1

    (t2) = 23821

    1

    n

    41t=1

    t

    2= 18081

    Fill in to get b = 0, 0130.

    Now the same for a: a = 3, 7651.

    Our Loglinear model is now given by:

    Dt = e3.7651 e0.0130 t

    To see which model fits better to the provided data, we calculate the MAD of ourLoglinear model:

    MAD = 141

    41t=1

    |dt e3.7651 e0.0130 t| = 10.982

    Apparently the Loglinear model beats the Linear model in accuracy, for this par-ticular case.

    4 Smoothing and Seasoning

    4.1 Smoothing

    For a short term forcasting, it it usefull to use smoothing. Smoothing adds moreweight to recent data, so that old data wont be important. In our model we useexponential smoothing; the weight of the older data decreases exponential. Thisalso effects the sum squared error, the new error is given by:

    S =

    j=1

    j(dtj a b (t j))2

    We call the discount factor, it has the following property: 0 < < 1

    Recall the Linear model:Dt = a + b t + t

    3

  • 8/7/2019 Research Master Diplomas

    4/6

    Brown proved not so long ago (1963) that the following expression for at and bt will

    minimize the weighted sum squared error:at =

    1 (1 )2

    dt + (1 )

    2 (at1 + bt1)

    bt =2

    1 (1 )2(at at1) +

    1

    2

    1 (1 )2

    bt1

    The new symbol is defines as: = 1 . This is to be estimated. We chose = 0.05 (thus = 0.95).

    So we see at and bt are a weighted mix ofdt and the estimate at time t1 for timet.

    To apply this new theory we need to begin with an estimate a0 and b0. Luckily forus we already have estimates that minimize the error at t = 1. See section 2.1 to

    find that a0 = 48.37 and b0 = 0.55.

    Filling in these values gives us a0 and b0 at time t = 2:

    a1 =

    1 (0.95)2 5 + 0.952 (48.37 0.55) = 43.645

    b1 =0.052

    1 (0.95)2 (43.645 48.37) + (1

    0.052

    1 (0.95)2) 0.55 = 0.657

    And now to use the same formula, but with t = 2 . . . 41 instead: See appendix forthe complete data.

    4.2 Seasoning

    Sometimes you can encounter a situation in which you get a regulary fluctuationin your data. To model this fluctuation we need another concept, called seasoning.Seasoning is the fluctuation of data that repeats itself every P periods. A linearmodel with a seasonal coefficient could look like this:

    Dt = (a + b t) Ft + t

    The new update formulas would be:

    at = WHdt

    FtP+ (1 WH)(at1 bt1)

    bt

    = WH(at

    at

    1

    ) + (1 WH) bt

    1

    Ft = WHdt

    at+ (1 WH) FtP

    Pis the number of seasons in a full cycle. WH, WHand WHare smoothingconstants.

    We are going to use this new theory, step by step.

    To begin we have to know what our Pis. Because we have data per quarteryear, it is logical that we use P= 4. A whole cycle is then equivalent to ayear.

    P= 4

    4

  • 8/7/2019 Research Master Diplomas

    5/6

    Estimate the trend point, using a moving average of one year (P= 4), which

    is centered at t 1

    2 . To estimate these trend points, calculate

    1

    2

    1

    4(dt2 + dt1 + dt + dt+1) +

    1

    4(dt1 + dt + dt+1 + dt+2)

    for 3 t 39

    In our situation at t = 3, 4, . . .:

    1

    2

    1

    4(5 + 30 + 45 + 63) +

    1

    4(30 + 45 + 63 + 86)

    = 45.875

    1

    2

    1

    4(30 + 45 + 63 + 86) +

    1

    4(45 + 63 + 86

    To determine the seasonal factor, we have to divide dt by the trendpoints att. To cut on the fluctuations, we average all

    Ftmod P, e.g. all factors with thesame period. Normalize these values so that they add to Pand average to

    1. To do this, calculateFt+2

    F(t+2)mod Pwith F(t+2) mod Pthe averaged F. In our

    concrete situation:

    45

    45.875= 0.981 and further

    63

    55.250= 1.140 and so on.

    Calculate the estimate F(t+2) mod P:

    0.981 + 0.786 + . . . + 0.792

    9= 0.815 (1)

    1.140 + 0.748 + . . . + 1.271

    9

    = 0.978 (2)

    1, 619 + 1.705 + . . . + 1, 125

    9= 1.359 (3)

    0.981 + 0.786 + . . . + 1, 045

    9= 0.882 (4)

    We must deseasonalize the seasoned demand: Divide the data by the normal-ized seasonal estimate. Now we are back to setting up a regular linear model.As for our data:

    5

    Fnorm1+2 mod 4=

    5

    1.359= 6.2

    30

    0.882= 6.2

    45

    0.815= 6.2

    63

    0.978= 6.2

    The others are found analogue to this, by dividing by the normalized factorthat belongs to the period t is in. Now to repeat section 2 and 3 to find thelinear and loglinear trend:

    5

  • 8/7/2019 Research Master Diplomas

    6/6

    As for the linear trend we find the following, ift > 41:

    Dt,41 = (a1,41 + b1,41 t) Ft = (48.09 0.53 t) Ft

    With Ft (0.808; 0.970;1.347; 0.875). We choose Ft so that it matchesthe period t is in. To calculate earlier values of the model, you dont haveto take count of the seasoning effect:

    D1 = a + b 1 = 48.09 0.53 1 = 47.6

    D2 = 48.09 0.53 2 = 47.0

    D3 = 48.09 0.53 3 = 46.5

    D4 = 48.09 0.53 4 = 46.0

    And so on. . .

    Now for the Loglinear model, for t > 41:

    ln

    Dt,41

    = (a1,41 + b1,41 t) Ft = (3.770.013 t) Ft

    With Ft (0.808; 0.970;1.347; 0.875). We choose Ft so that it matchesthe period t is in.

    For the data before t = 41 we get the same calculation as in 3:

    ln(Dt) = a0 + b0 t = 3.77 0.013 t

    D1 = e3.770.0131 = 42.735

    D2 = e3.770.0132 = 42.178

    D3 = e3.770.0133 = 41.629

    D4 = e3.770.0134 = 41.078

    ...

    6