Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang,...
-
Upload
clifton-gallagher -
Category
Documents
-
view
226 -
download
9
Transcript of Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang,...
Looking Ahead of the Curve: an ARIMA Modeling Approach to
Enrollment Forecasting
John G. Zhang, Ph.D.
Harper College
47th AIR Annual Forum 2
Topics
• Why forecast
• How to forecast
• Why ARIMA
• What is ARIMA
• How to ARIMA
• How ARIMA did
• Discussion
47th AIR Annual Forum 3
Why Forecast
• Queries and Reports: what was
• Dashboard: what is
• Forecasts: what will be
• Forecast for enrollment: more valuable for resources planning
47th AIR Annual Forum 4
How to forecast
• Naïve forecast: random walk, moving average
• Exponential smoothing• Markov chain• Regression• ARIMA• Others• Combining methods
47th AIR Annual Forum 5
Why ARIMA• Naïve forecast: best guess if no patterns• Exponential Smoothing: usually designed for
one-step ahead forecast• Markov chain: see reference• Regression: frequently violates the
assumption of uncorrelated errors• ARIMA: worked well, more later• Others: see reference• Combining Methods: non-directional
47th AIR Annual Forum 6
What is ARIMA
• AutoRegressive Integrated Moving Average
• Generally, the model is given by
t
q
i
iit
dip
i
ii BXBB
10
1
1)()1(1
47th AIR Annual Forum 7
• where Xt is a time series value at time t, 0 is a constant, • B is a backshift or lag operator, • i is a number of lags or spans, is an error term at time t, and θ are AR and MA parameters, and • p, d, and q are the orders of AR, I, MA
t
q
i
iit
dip
i
ii BXBB
10
1
1)()1(1
tX
47th AIR Annual Forum 8
• if p = 1, d = 0, q = 1, ARMA(1, 1):
(1 - 1B)(Xt – θ0) = (1 - θ1B) t
• If p = 1, d = 0, θ1 = 0, AR(1) model:
(1 - 1B)(Xt – θ0) = t
• If p = 1, 1 = 1, d = 0, θ1= 0, random walk: • (1 - B)(Xt – θ0) = t
• If 1 = 0, d = 0, θ1 = 0, constant:
(Xt – θ0) = t
t
q
i
iit
dip
i
ii BXBB
10
1
1)()1(1
47th AIR Annual Forum 9
How to ARIMA
• Box and Jenkins (1976) notation:(p d q)(p d q)s
• Four stages:IdentificationEstimationValidationForecasting
47th AIR Annual Forum 10
How to ARIMA• SPSS Trends module:
version 12 worked well
version 13 and 14: algorithms changed same data, same program, different forecast
• SAS ETS module:
ARIMA procedure more flexible
forecast consistant
automation possible thanks to macros
47th AIR Annual Forum 11
Identification
• Series Plot
• Autocorrelation plot
• Dickey-Fuller test of unit root hypothesis
• AR models to compare the log likelihood values for a series and its transformed series
47th AIR Annual Forum 12
Identification
• Degree of differencing
• Order of AR
• Order of MA
• Seasonality if any
47th AIR Annual Forum 13
Estimation
• Q statistics• Goodness-of-fit criteria:
variance estimateAkaike information criterionSchwartz Bayesian criterion
• Significance of parameters• Residuals analysis• Mean Absolute Percent Error
47th AIR Annual Forum 14
Data
• Time series data
• Date variable: year, quarter, month, week, day, hour, minute, second
• Enrollment data: FTE, headcount, seatcount
• Data points
• Nature of the series determines the forecast
47th AIR Annual Forum 15
Patterns of Data
• Trend: steady increase or decrease in the values of a times series
• Cycle: long-term patterns of rising and falling data
• Seasonality: regular change in the data values that occurs at the same time in a given period
47th AIR Annual Forum 16
FTE
47th AIR Annual Forum 17
FTE Pattern
• Trendy: FTE increasing from 1998 to 2006, suggesting non-stationary and differencing necessary
• Seasonal: higher in the Fall and Spring and lower in the Summer each and every year, implying a seasonal factor present as part of the model building process
47th AIR Annual Forum 18
Autocorrelations and Partial Autocorrelations (ACF and PACF)
ACF
• Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 • 0 1.00000 | |********************|• 1 0.64901 | . |************* |• 2 0.29267 | . |****** |• 3 -.06855 | . *| . |• 4 -.42111 | ********| . |• 5 -.42944 | *********| . |• 6 -.43520 | *********| . |• 7 -.40880 | ********| . |• 8 -.38067 | ********| . |• 9 -.06784 | . *| . |• 10 0.25681 | . |***** . |• 11 0.55983 | . |*********** |• 12 0.85774 | . |***************** |• 13 0.55625 | . |*********** |• 14 0.24975 | . |***** . |• 15 -.06186 | . *| . |• 16 -.36715 | . *******| . |• 17 -.37708 | . ********| . |• 18 -.38454 | . ********| . |• 19 -.36197 | . *******| . |• 20 -.33780 | . *******| . |• 21 -.07144 | . *| . |• 22 0.20576 | . |**** . |• 23 0.46222 | . |********* . |
PACF
• Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1• 1 0.64901 | . |************* |• 2 -0.22210 | ****| . |• 3 -0.28449 | ******| . |• 4 -0.37073 | *******| . |• 5 0.18006 | . |**** |• 6 -0.26468 | *****| . |• 7 -0.29117 | ******| . |• 8 -0.45581 | *********| . |• 9 0.72564 | . |*************** |• 10 0.06626 | . |* . |• 11 0.26005 | . |***** |• 12 0.18460 | . |**** |• 13 -0.22575 | *****| . |• 14 0.14806 | . |***. |• 15 0.10247 | . |** . |• 16 0.16423 | . |***. |• 17 -0.18254 | ****| . |• 18 0.15059 | . |***. |• 19 -0.04279 | . *| . |• 20 0.11045 | . |** . |• 21 -0.18268 | ****| . |• 22 0.08106 | . |** . |• 23 -0.06703 | . *| . |
47th AIR Annual Forum 19
Q Statistics
• Q Statistics show autocorrelations among various lags highly statistically significant
• Autocorrelations were very high• Further actions needed
Autocorrelation Check of Residuals
To Chi- Pr > Lag Square DF ChiSq --------------------Autocorrelations--------------------
6 385.69 6 <.0001 0.937 0.874 0.808 0.743 0.727 0.711 12 777.02 12 <.0001 0.709 0.707 0.752 0.799 0.833 0.866 18 1107.12 18 <.0001 0.811 0.755 0.697 0.640 0.624 0.608 24 1436.47 24 <.0001 0.605 0.603 0.640 0.679 0.706 0.732
47th AIR Annual Forum 20
FTE Forecast
0
2,000
4,000
6,000
8,000
10,000
12,000
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
Summer
Fall
Spring
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Fiscal Year
FT
E FTE
LCL
UCL
47th AIR Annual Forum 21
How ARIMA Did
• Accuracy: what matters most
• 2-period ahead: 0.74% (FTE) 0.50% (HC)
• 6-period ahead: 1.43% (FTE) 1.65% (HC)
• 10-period ahead: 1.40% (FTE) 2.52%(HC)
• Forecast error bigger into distant future
• Eleanor S. Fox (2005) 1.2% (4) 4.1% (8)
• NCES (2003) 1.9% (2) 3.6% (6)
47th AIR Annual Forum 22
Discussion
• Theoretically factors includable along with the time series itself like in regression
• Unemployment rate• Consumer Price Index (CPI)• High school student population• District population• Tuition• Forecasts used for forecasting?
47th AIR Annual Forum 23
Discussion
• Stationarity and homogeneity
• Scarcity and spuriousness
• Seasonality and outliers
• Raw or cooked data
• Data mining and stepwise
• Fit and accuracy
• Additive or multiplicative (subset/factored)
47th AIR Annual Forum 24
Discussion
• Science and art
• Objective and Subjective
• Quantitative and qualitative
• Over-differencing and over-fitting
• Parsimony and uncertainty
• Simple or complex