Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths...

13
AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH © 2017, Science Huβ, http://www.scihub.org/AJSIR ISSN: 2153-649X, doi:10.5251/ajsir.2017.8.3.34.46 Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso, Oyo-State, Nigeria. ABSTRACT This study aims at modelling Nigerian total livebirths data, and to select the appropriate model for the disaggregated livebirths series among the proposed univariate stochastic time series models, based on in-sample fitting. Forecast of demographic variables such as births has a great influence on the growth of a population with respect to its demands on various systems such as education, health, economy, and provision of social amenities for future population over a period of time. The result of the Dickey-Fuller test confirmed the stationarity of the livebirths series, after subjecting it to non-seasonal differencing and Box-Cox variance stabilizing transformation. Correlogram visual analysis method was employed to identify reasonable models for the total livebirths series; and the identified univariate models are ARIMA(1,1,0), and SARIMA(1,1,0)(0,0,1) 4 . The diagnostic checks to evaluate the adequacy of the fits of the models revealed that the residuals of the two univariate models were indeed serially uncorrelated. The results of other measures of adequacy verification and forecast accuracy suggest that the two univariate models provided adequate predictive models for the Nigerian total livebirths series, with the SARIMA model outperforming the ARIMA model. Thus the SARIMA model was chosen as the more appropriate model in fitting the livebirths data because of its overall fit as confirmed by its measures of forecast performance. Keywords: Autoregressive Integrated Moving Average (ARIMA) model, Seasonal Autoregressive Integrated Moving Average (SARIMA), Augmented Dickey Fuller (ADF) test, Autocorrelation function (acf), Partial autocorrelation function (pacf), Box-Cox variance stabilizing transformation. INTRODUCTION Fertility is a vital component of demographic change and for predicting the size and structure of a population. Four demographic terms namely fertility, livebirths, natality and reproduction have overlapping meanings, relating to total births, including live-births and still-births. Fertility which is the actual birth performance in a population is also defined as the frequency of child bearing among a population. Natality in a broader sense is the role of births in a population change and human reproduction, [36]. The demographic definition of a livebirth as recommended by the United Nations and the World Health Organisation, defined livebirth as the complete expulsion or extraction from its mother of a product of conception, irrespective of duration of pregnancy, which after such separation, breathes or shows any other evidence of life, such as beating of the heart, pulsation of the umbilical cord, or definite movement of voluntary muscles, whether or not the umbilical cord has been cut or the placenta is attached; each product of such a birth is considered livebirth, [29]. Fertility rate refers to the relative frequency with which births occur in a given population, while birth rates refer to the rate of incidence of births within the total population. Reproduction refers to the ability of a population to grow and replace itself; and this encompasses the processes of birth and death and also examines the extent to which fertility balances the force of mortality and permits population growth as well. Nigeria has a population close to 124 million in 2003; and is one of the 10 most populous countries in the world, and the most populous country in Africa. The crude birth rate which is the number of births per thousand of a population for Nigeria stood at 36.1 in 2010, [31]. The estimated total fertility rate per woman in Nigeria has also decline from 7.2 in 1960 to 4.0 in 2013, [31]. This is an indication that though the number of livebirths is on the increase in Nigeria, nevertheless there is a sharp reduction in the number of children a woman between ages 15-49 would have during her reproductive life, if for all of her childbearing years she were to experience the age-

Transcript of Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths...

Page 1: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH

© 2017, Science Huβ, http://www.scihub.org/AJSIR

ISSN: 2153-649X, doi:10.5251/ajsir.2017.8.3.34.46

Stochastic models of Nigerian total livebirths

Adekanmbi, D.B

Department of Statistics, Ladoke Akintola University of Technology,

Ogbomoso, Oyo-State, Nigeria.

ABSTRACT

This study aims at modelling Nigerian total livebirths data, and to select the appropriate model for the disaggregated livebirths series among the proposed univariate stochastic time series models, based on in-sample fitting. Forecast of demographic variables such as births has a great influence on the growth of a population with respect to its demands on various systems such as education, health, economy, and provision of social amenities for future population over a period of time. The result of the Dickey-Fuller test confirmed the stationarity of the livebirths series, after subjecting it to non-seasonal differencing and Box-Cox variance stabilizing transformation. Correlogram visual analysis method was employed to identify reasonable models for the total livebirths series; and the identified univariate models are ARIMA(1,1,0), and SARIMA(1,1,0)(0,0,1)4. The diagnostic checks to evaluate the adequacy of the fits of the models revealed that the residuals of the two univariate models were indeed serially uncorrelated. The results of other measures of adequacy verification and forecast accuracy suggest that the two univariate models provided adequate predictive models for the Nigerian total livebirths series, with the SARIMA model outperforming the ARIMA model. Thus the SARIMA model was chosen as the more appropriate model in fitting the livebirths data because of its overall fit as confirmed by its measures of forecast performance. Keywords: Autoregressive Integrated Moving Average (ARIMA) model, Seasonal Autoregressive Integrated Moving Average (SARIMA), Augmented Dickey Fuller (ADF) test, Autocorrelation function (acf), Partial autocorrelation function (pacf), Box-Cox variance stabilizing transformation.

INTRODUCTION

Fertility is a vital component of demographic change and for predicting the size and structure of a population. Four demographic terms namely fertility, livebirths, natality and reproduction have overlapping meanings, relating to total births, including live-births and still-births. Fertility which is the actual birth performance in a population is also defined as the frequency of child bearing among a population. Natality in a broader sense is the role of births in a population change and human reproduction, [36]. The demographic definition of a livebirth as recommended by the United Nations and the World Health Organisation, defined livebirth as the complete expulsion or extraction from its mother of a product of conception, irrespective of duration of pregnancy, which after such separation, breathes or shows any other evidence of life, such as beating of the heart, pulsation of the umbilical cord, or definite movement of voluntary muscles, whether or not the umbilical cord has been cut or the placenta is attached; each product of such a birth is considered livebirth, [29].

Fertility rate refers to the relative frequency with which births occur in a given population, while birth rates refer to the rate of incidence of births within the total population. Reproduction refers to the ability of a population to grow and replace itself; and this encompasses the processes of birth and death and also examines the extent to which fertility balances the force of mortality and permits population growth as well.

Nigeria has a population close to 124 million in 2003; and is one of the 10 most populous countries in the world, and the most populous country in Africa. The crude birth rate which is the number of births per thousand of a population for Nigeria stood at 36.1 in 2010, [31]. The estimated total fertility rate per woman in Nigeria has also decline from 7.2 in 1960 to 4.0 in 2013, [31]. This is an indication that though the number of livebirths is on the increase in Nigeria, nevertheless there is a sharp reduction in the number of children a woman between ages 15-49 would have during her reproductive life, if for all of her childbearing years she were to experience the age-

Page 2: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

35

specific birth rates for that given year. Between 1980 and 2003, it has been observed that the birthrate among Nigerian women aged 15-19 has declined by 27%. However, because Nigeria population increased rapidly, the annual number of births to teenage women increased by 50% over this period, [32]. Adolescent birthrate therefore contributes significantly to the total birth rate of women in Nigeria. In the recent, age-at-marriage of females has increased which consequently lead to a decrease in fertility rate in Nigeria. A decrease of total fertility rate may be due to postponement of births among the females of reproductive age in the country. The estimated crude birth rate of Nigeria which stood at 46.2 in 1970 has reduced to 41.5 as at 2012, [30].

The main sources of data on livebirths are vital statistics registration system, national censuses, and national sample surveys. Most developing nations in Africa now have vital registration system which provides birth statistics among other relevant vital statistics that can be collected through the registration system. In Nigeria, the National Population Commission is an agency with statutory mandate to establish and maintain a uniform system of vital registration for the nation with a view of providing vital statistics on a regular basis for the purpose of socio-economic planning, [24]. The National Population Commission reported a total of 9, 936,221 registered livebirths in Nigeria during the period 1994-2007.

Modelling of demographic phenomenon can contribute to understanding such demographic variable by revealing something about the process that builds persistence into the series. Time series modelling is a critical step in direct projection of demographic phenomenon such as total livebirths. Time series models are useful statistical instrument for short term forecasts of births, to indicate the number of children to be born if movements observed in the recent past continue in the near future, [21]. Univariate time series models use only past values of the variable under consideration to forecast future values. Two univariate stochastic time series models to be considered in this study are: Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA). ARIMA model, a forecasting instrument in time series is adequate in calculating projections for a particular variable based on observations of that same variable in the past, [5]. Several researchers in the past have proposed Autoregressive Integrated Moving Average (ARIMA) models in predicting future

demographic variables of different populations, [4, 21, 22, 25]. Seasonal ARIMA (SARIMA) model is a generalization of a non-seasonal ARIMA model, [18]. SARIMA model is capable of describing a wide range of series containing random changes in level and slope of their seasonal pattern. In seasonal data there are two time intervals of importance, [23]. A SARIMA model may be thought of as describing two effects simultaneously. When dealing with a quarterly series, the quarter-to-quarter behavior is assumed to be described by a non-seasonal ARIMA model with parameters (p, d, q), while the residuals from this model are assumed to be represented by a year-to-year ARIMA model with parameters (P, D, Q).

The focus of this study is to propose univariate stochastic time series models namely: ARIMA model and SARIMA model for projecting total livebirths of Nigeria and to conduct empirical evaluation of the forecasting power of the models based on measures of forecast performance. This article has been organised into eight sections. Description of the data employed and its disaggregation are discussed in section 2. Section 3 focuses on the theory of univariate time series models, which are ARIMA and SARIMA models. Overview of the theories of diagnostic tests, measures of adequacy of time series models and model selection are presented in sections 4, 5 and 6 respectively. Results of the analyses and an extensive discussion on the models are presented in section 7. Conclusions and recommendations based on the study are given in section 8.

Data: Data on livebirths were extracted from the reports on vital statistics registration of livebirths of the publication of the National Population Commission in Nigeria, [24]. The statistical report on livebirths by the Commission on birth registration coverage stood at 35% as at 2007. The data were recorded on annual basis from 1994 to 2007. Time series models are adequate for short term forecasting with fifty or more observations of data point in order to obtain good estimates of the parameters of a time series model, [5, 6, 8, 13, 25]. It is therefore necessary to disaggregate the annual observations on livebirths to quarterly figures so as to have series with high caseload that will not be affected by random fluctuations; and to uncover the possible pattern of total livebirths in Nigeria. Boot-Feibes-Lisman first difference (BFL-FD) [3], which is a non-model based method but suitable in disaggregating annual series into quarterly figures, was employed to disaggregate

Page 3: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

36

the annual data on registered livebirths into quarterly figures.

Univariate Time Series Models: Univariate stochastic time series models attempts to forecast a time series yt from its past history only. Forecast from a univariate time series can be written as:

............4ty4π2ty3π1ty2πty1πForecast

(1)

A univariate model must be developed so that the

weights i s can be expressed in terms of few

parameters, in order to achieve parsimony. A parsimonious model is the one which represents the data adequately with the minimum number of parameters. ARIMA and SARIMA models are referred to as univariate because only one variable depending on its past values is included. The models are useful for out-of-sample forecasting and for descriptive analyses. The two univariate time series models that will be considered in this study are: (i) ARIMA (ii) SARIMA. The univariate models use the total livebirths as variable for the reference period. The main objective of a time series model is forecasting. Forecasting is to estimate the future value of a series as accurately as possible from the current and past values of the observed time series, [5,7].

Arima Modelling: Autoregressive Integrated Moving Average (ARIMA) models which is the most general model of univariate time series models was first introduced by Box and Jenkins, [5]. ARIMA models may possibly include autoregressive terms, moving average terms, and differencing operation. ARIMA models are capable of describing a wide class of stationary and non-stationary time series containing stochastic trends; and have become popular time series models for forecasting demographic variables, [21, 22, 25]. They are capable of describing series whose slope, level and higher derivatives are being continuously modified by random shocks entering the system, [6]. ARIMA model could be used to forecast a value in a response series as a linear combination of its own past values and past errors. The models could be adopted for short term forecasts of livebirths of a population to determine the future livebirths of

such a population. ty can be projected based on the

observed number of livebirths,

tt

d εBθyBφ (2)

where

B : Backward shift operator such that jtt

j yyB

Bφ : the autoregressive (AR) operator, such that:

pBpφ....B

1φ1Bφ .

Bθ : the moving average (MA) operator, such that:

qBqθ....B

1θ1Bθ .

tε : a white noise sequence also referred to as

disturbance term such that

0tεE ; 2εσ

2tεE and 0tεE

kt .

ty : livebirths at time t.

d: order of non-seasonal differencing, and is a non-negative integer.

Equation (1) is referred to as an autoregressive integrated moving average (ARIMA) process. Simply written as ARIMA(p, d, q) model.

where

p: order of non-seasonal autoregressive (AR) term.

d: order of non-seasonal differencing.

q: order of non-seasonal moving average (MA) term.

If 1d , (1) now becomes

qt1tttp1tt θε......θεεyφ.........yφy p

(3)

The series yt will be stationary when the zeros of

Bp are all outside the unit circle, and will be

invertible when the zeros of Bpθ are all outside

the unit circle, [1, 15, 16, 28]. Stationarity of the series must first be checked by subjecting the series to Augumented Dickey-Fuller (ADF) unit root test, [12]. Unit root makes a series non-stationary. If the result of the ADF test shows that the original series is non-stationary, the degree of the difference d should be chosen to transform the series into a stationary series. Differencing d times removes polynomial trend of degree d, and also removes stochastic ‘unit root’ non-stationarity. In order to determine the order of integration, d, of a series one could proceed by testing the differenced series until the unit root hypothesis is rejected. A difference stationary series

Page 4: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

37

is said to be integrated, denoted as I(d), where d is the order of integration. The order of integration is the number of unit roots contained in a series, and determines the number of differencing operations required to make the series stationary. For a series with one unit root, it is regarded as I(1), and a stationary series is I(0), [14]. A stationary series have constant mean, variance and autocorrelations, so that the autocovariances of observations with a fixed interval are constant through time.

When a series is characterized by heteroscedasticity, it cannot be seen as the realization of an ARIMA process even after differencing, because of nonstationarity in variance, which can be removed by applying Box-Cox transformation, [7, 8, 11, 13]. The Box-Cox transformation is a variance stabilizing measure which deal with nonstationarity in variance.

0λift

ylogλtz

0λifλ

1λty

λtz

(4)

The Box-Cox transformation allows reaching a good level of symmetry of a series, independence of random effects and stable variance. Apart from the

case of 0λ , which corresponds to the logarithmic

transformation, other particular values are 0.5λ ,

(square root) and 3

1λ (cubic root). λ can be

determined through maximum likelihood estimates or by common visual analysis. A good value of λ to

treat heteroscedasticity also makes the process similar to a Gaussian and reduces the effects of possible outliers. Mean non-stationarity can therefore be removed by differencing the series to induce stationarity, while variance non-stationarity can be removed by subjecting the series to Box-Cox variance stabilizing transformation. Unless the trend is removed, no other components can be recognized from acf and pacf.

After achieving stationarity in a series, the first step in ARIMA modelling is to identify an adequate model, which involves specifying the appropriate structure and order of a model. Identification could be done by visual inspection of the time series plot of the data, acf and pacf; judging the appropriate model structure and order from the appearance of the plotted acf and pacf. Many popular statistical computing softwares now have facility for identifying possible models for a

series, [19]. The second step is to estimate the

parameters of the model. Parameters θφ, can be

estimated by least squares regression, and sometimes could require more complicated iteration procedure, [9].

The next step is diagnostic checking of the proposed model which involves verification of the model to ensure that the residuals of the model are random, and to ensure that the estimated parameters are statistically significant. If the ARIMA model is correctly specified, the residuals from the model should be nearly white noise. This implies that there should be no serial correlation left in the residuals. If the proposed model fails the diagnostic checks, one starts over with identification. Estimation process is usually guided by the principle of parsimony, by which the best model is the simplest possible model which adequately describes the series. The goal of ARIMA analysis is a parsimonious representation of the process governing residuals. Models that survive diagnostic checks could be used to forecast, using measures of accuracy to assess the performance of such model.

Seasonal ARIMA (SARIMA) Modelling: Many demographic variables are seasonal, and this variation would be present even if the factors are not casually related. When a series possess a seasonal component that repeats every s observations, SARIMA models could be formulated to deal with seasonality in such data. Seasonality in a series is a regular pattern of changes that repeats over s time periods, where s is the number of time periods until the pattern repeats again. The fundamental fact about seasonal series with period s therefore is that observations which are s intervals apart are similar. Seasonality usually causes a series to be non-stationary because the average values at some particular times within the seasonal span may be different from the average values at other times, [7]. Seasonal differencing removes seasonal trend, and can also get rid of a seasonal random walk type of nonstationarity. A series with both trend and seasonality could be subjected to a non-seasonal difference and a seasonal difference to induce stationarity. The seasonal ARIMA (SARIMA) model incorporates both non-seasonal and seasonal factors in a multiplicative fashion. A multiplicative model is one in which for example, for quarterly series, the quarter-to-quarter behavior can be separated from the year-to-year behavior, [18]. This implies that the autoregressive operator, which dictates the weight to be applied to the past history of the series, can be

Page 5: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

38

factorised into a product of a non-seasonal autoregressive operator and a number of seasonal autoregressive operators, one for each seasonal periods. Similarly, the moving average operator may be written as the product of non-seasonal and seasonal moving average operator. SARIMA models

are ARIMA(p,d,q) models whose residuals tε are

ARIMA(P,D,Q), so that

sQD,P,xqd,p,SARIMA~t

y . The general form of

the multiplicative SARIMA model which can be written as SARIMA(p, d, q)(P, D, Q)s model as suggested by reference [5] is:

Tt ,......2,1tε

sB

QΘBqθty

Ds

dsB

PΦBpφ

(5) where p: order of non-seasonal autoregressive (AR) term. q: order of non-seasonal moving average (MA) term. d: order of non-seasonal differencing. P: order of seasonal autoregressive (AR) term. Q: order of seasonal moving average (MA) term. D: order of seasonal differencing. s: seasonal length or seasonal order.

B: Backward shift operator, such that 1tytBy

and 1ttB .

: Differencing operator, such that 1t

yyty

t

and t

wy1d

tyd

t

B11

Bp

φ : AR polynomials of B of order p.

pBpφ...

2B2φB1φ1B

Bq

: MA polynomials of B of order q.

qB...

2B2B11B

qq

sBp : seasonal AR polynomial of

sB of order P.

PsBP......

2sB2

sB11

sBp

sBQ

: seasonal MA polynomial of s

B of order Q.

QsBP......

2sB2

sB11

sB Q

tε : is a sequence of identically and independently

distributed random variables, which are normally

distributed with mean zero and variance 2εσ . It is also

referred to as a white noise process. The assumption of normality of the model white noise process is usually made in order to construct forecast intervals for point forecasts. SARIMA modeling process is also carried out in three stages which are: Model identification, model estimation and order selection; and diagnostic checks and residual analysis which is also referred to as verification of the adequacy of the model. Plots of acf and pacf could be used to discover s and also the degree of differencing that is necessary to achieve stationarity. Spikes in the acf at low lags could indicate non-seasonal MA terms, while spikes in the pacf indicate possible non-seasonal AR terms. Repetition of patterns across the lags of the correlograms that are multiples of s could indicate seasonality in the series, [11]. A pure seasonal MA should have a significant value for the acf at the lag of the period and roughly zero otherwise. A pure seasonal AR should taper off exponentially at multiples of the lag of the period and be roughly zero. The pacf of a pure seasonal MA should taper off exponentially at multiples of the period and be zero otherwise. The pacf of a pure seasonal AR should cut off after the lag of one period, and should be zero for all other values [5, 9, 11]. Having identified the appropriate SARIMA model for a series, the parameters of the selected model are estimated at the estimation stage. Diagnostics measures should also be employed to evaluate the appropriateness of the fits. If the model is found inadequate, the three stages are repeated until satisfactory SARIMA model is selected for the series under consideration. Diagnostic Tests For Time Series Models: A

crucial aspect of the model building process is

diagnosis of the univariate models, which involves

analysing the model residuals to verify that the

residuals are random, [9]. Different diagnostic tests

can be performed to ascertain the adequacy of the

model. A large residual is an indication that the model

is not adequate, while the model is a good fit for a

series if the residuals should be random. The

residuals should be without pattern, so that there

should be no serial correlation in the residuals. One

of such tests is to verify if the residuals of a proposed

univariate model is a white noise process. The

significance of the autocorrelation of the residuals

could be checked if they are within the insignificance

bound, which is the two standard error bounds

Page 6: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

39

N2 , [20]. The desirable result is that the

correlation is 0 between residuals separated by any

given time span, so that residuals should be

unrelated to each other. The adequacy of the model

should be questioned if the autocorrelations of the

residuals of the first N/4 lags are close to the critical

bounds.

Another approach to evaluating the randomness of univariate time series models residuals is to examine

the acf ‘as a whole’ rather than at the individual s'rk

separately, [9]. The test is called the portmanteau lack-of-fit test or Ljung-Box statistic. It is also simply referred to as Q-statistic. Q-statistic for high-order serial correlation is another diagnostic test for univariate time series models, which is a function of the accumulated sample autocorrelations up to any specified time lag. Q-statistic test is therefore a more general test for serial correlation in the residuals. The most direct evidence of random residuals is the absence of significance values of the Q-statistic at lags of about one quarter of the sample size. The ideal acf of residuals is that all autocorrelations are zero. A significant Q for residuals indicates a possible problem with the model. If the autocorrelations of the residuals at a particular lag exceeds the confidence level, but the Q-statistic at that lag is non-significance, then the autocorrelation could be regarded as a chance occurrence. Q-statistic is

approximately a 2

pnχ distribution, where p is the

number of parameters in the model. The Q statistic is:

2a~

m

1k

1KρKn2nnQ

(6)

where

a~ρ : are the autocorrelations of estimation residuals.

K: a prefixed number of lags.

n: sample size.

Accuracy of Forecast Models: In most forecasting

situations, accuracy is treated as the overriding

criterion for selecting a forecasting model. The word

‘accuracy’ refers to goodness of fit, which in turn

refers to how well the forecasting model is able to

reproduce the observed series. Empirical evaluations

of the performance of forecasting models rely upon

measures of error criterion such as Forecast Mean

Square Error (FMSE), Forecast Mean Absolute Error

(FMAE) or Forecast Mean Absolute Percentage Error

(FMAPE), [20]. The measures are used in measuring

the accuracy of the projection of the models. Given a

particular model for a process,

ttt εμy

(8)

The error after ty is observed is ttt μye .

where

tμ : is a known function of past valuestεandty .

te : is a realisation of random variable tε and are

independent and identically distributed

with mean 0 and variance2σ .

2teE

2μtyEFMSE (9)

The Forecast Mean Absolute Error (FMAE) could be computed using the formula

tεEtμtyEFMAE (10)

The Forecast Mean Absolute Percentage Error (FMAPE):

ty

tμtyEFMAPE (11)

Given that yt takes only positive values.

Model Selection: The objective penalty function criteria such as Akaike Information criteria (AIC), Bayesian Information Criteria (BIC) or Schwarz-Bayesian Information Criteria (SBC) can be used to choose the most appropriate model among the proposed models for a series. These criteria try to find a trade-off between the goodness-of-fit and the parsimony of the models considered. Akaike, [2] proposed an information criterion of the form

2kL2InAIC (12)

where

K: number of parameters in the model to be estimated.

L: Likelihood function of the model.

Page 7: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

40

When there are two or more competing models, the model with the smallest AIC is deemed best in the sense of minimizing the forecast mean square error, [20]. It was however pointed out by Schwartz [27] that AIC is not a consistent criterion due to the fact that it does not select the true model with probability approaching 1 as n , [10, 17]. Schwartz [2]

therefore proposed the Bayesian Information criterion (BIC)

kIn(n)L2InBIC (13)

where

n: number of observed data points or sample size.

The BIC include fewer terms than the AIC since the penalty term is greater. If the process follows an ARIMA(p,d,q) model, then it is known that the orders specified by minimizing the BIC are consistent, that is they approach the true orders as the sample size increases. However if the true process is not a finite order ARIMA process, then minimizing AIC among an increasingly large class of ARIMA models will lead to an optimal ARIMA model that is closest to the true process among the class of models under study.

RESULTS AND DISCUSSION OF THE ANALYSES:

The time plot for the quarterly disaggregated livebirths series is shown in Figure 1. Examination of

the time plot revealed that there is no consistent trend in the series over the time period considered. The livebirths series did not show any marked seasonality pattern, and the curved trend shows that the series is not stationary in variance. A downward trend in the number of registered livebirths was noticed in 1998, which was then followed by an upward trend till 2003, after which there was a slight downward trend. A sudden upward peak was obvious in 2006, reaching its maximum in 2007, followed by a downward trend. A sharp peak is an indication that highest number of livebirths was recorded in this period. The disaggregated series was subjected to the Augumented Dickey-Fuller test to ascertain its stationarity status. The ADF value of -3.0655 with p-value of 0.1723 at 5% level of significance, showing that there is presence of unit root and the original disaggregated series is therefore not stationary. The autocorrelations of the series exhibits non-stationarity in mean and the variability of the data does not look constant, as shown in Figure 2(a). After subjecting the series to non-seasonal differencing and Box-Cox variance stabilizing transformation, the value of the ADF test yielded a value of -5.020 with a p-value 0.0105 at 5% level of significance, an evidence that stationarity has been achieved.

Fig. 1: Time plot of the disaggregated livebirths series (yt).

Figure 2 shows the acf and the pacf of the disaggregated livebirths series, yt, while figure 3 is the

acf and pacf of the differenced series ty with log

transformation. The acf of the livebirths data shown in Figure 2(a) has a sinusoidal pattern that tails off to zero reflecting the meandering shape of the series, which indicates that the series is non-stationary;

suggesting that at least one non-seasonal differencing will be appropriate. The livebirths series is expected to possess non-stationarity characteristics, since the series represent ‘uncontrolled’ behavior of certain process outputs. Spikes in the pacf could indicate possible non-seasonal AR terms. The pacf of the series shows a

Page 8: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

41

clear single positive spike at lag 1 and a negative spike at lag 2. After subjecting the series to variance

stabilizing transformation so that tylogtz and

non-seasonal differencing tz to remove trend

nonstationarity; the series becomes a stationary process, as shown in figure 3(a). The acf cuts off after lag 2, and the rest of the values randomly oscillate about zero, within the insignificance limits. The non-significance spikes show some patterns of groups of positive and negative values. The autocorrelations at lag 1 and 2 are positive, an indication that the AR is positive. The tapering pattern in the lags of the acf suggests that a non-seasonal AR(1) may be a useful part the model. The large spike in the acf at lag 1 might lead to a seasonal MA(1) interpretation. The identified univariate models are therefore ARIMA(1,1,0) and SARIMA(1,1,0)(0,0,1)4.

Fig. 2: acf and pacf of livebirths series (yt)

Fig. 3: acf and pacf of t

z .

The equation of the fitted ARIMA model is therefore:

tt εzφB1

(14)

ARIMA(1,1,0) can be written explicitly as

t2t1t1tt εφzφzzz

(15)

t2t1t1tt εzzφzz

(16)

The equation of the SARIMA(1,1,0)(0,0,1)4 model is

t

4

t εΘB1zφB1

The SARIMA model can also be written explicitly as

4tt2t1t1tt Θεεzzφzz

(17)

The disaggregated livebirths data are quarterly series, so that the period length is 4.

Page 9: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

42

Table 1. Model estimation for the livebirths models

Model Estimated Model MLEa SE

b t-

value p-value

Stationary R

2 value

R2

value

1,1,0ARIMA tεtz0.508B1 φ

=0.508

0.126 4.028 0.000 0.247 0.824

41,0,01,1,0SARIMA tεB421.01tz0.519B14

φ

=0.519

0.124 4.187 0.000 0.302 0.902

=0.421

0.138 3.056 0.004

a The maximum likelihood estimates.

b Standard error.

Table 2. Performance Comparison of the Univariate Models

The summary of the fitted model of the livebirths series is shown in Table 1. The estimates of the parameters of the proposed univariate models including the associated standard errors, t-values and their corresponding significant values are also reported in Table 1. The associated p-value of the AR parameter shows that its coefficient is statistically significant. The estimated coefficients for the

SARIMA model are also significant as revealed by their p-values. The R

2 value for the ARIMA model is

0.824, indicating that over 82% of the total variation in the series is accounted for by the model, and this clearly demonstrate the effectiveness of ARIMA(1,1,0) in modeling the livebirths series. The value of R

2 for the SARIMA model is 0.902 which

shows that more than 90% of the variation in the series is explained by the model, and the model is considered to be practically significant. This is an indicating that the inclusion of the seasonal MA terms

is desirable. The estimate of the residual standard

deviation 199975.0σ̂ for the ARIMA model

implies that the standard deviation of the residuals is approximately 20% of the level of the series. The criteria of selecting the better model is based on minimization of forecast errors, AIC, BIC and on variance. It is noticeable from Table 2 that the SARIMA model appears to have performed better than the ARIMA model, based on the values of their AIC, BIC and measures of forecasting accuracy. The result of performance evaluation of the two univariate models therefore show that SARIMA yields better forecast compared to the ARIMA model. The ARIMA model though not as good as the SARIMA model, but would be suitable for forecasting as well.

Measures of diagnostic verification to determine the adequacy of the models in representing the underlying process in the livebirths series were performed. The acf of the residuals and Ljung-Box test could be used to check for correlations between

Model AIC BIC FMSE FMAE FMAPE Residual Variance

1,1,0ARIMA -15.02 21.529 45521.864 23212.44 12.178 0.03999

41,0,01,1,0SARIMA -20.89 21.033 34179.619 19699.29 11.577 0.03236

Page 10: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

43

successive forecast errors. The acfs of the residuals of the two models show no significant correlation in the residuals for all the lags, except at lag 0, as shown in figures 4(a) and 5(a). The big spike at the beginning is the unimportant lag 0 correlation. Figures 4(b) and 5(b) give p-values for the Ljung-Box

statistics for each lag up to 10 which are all well above 0.05, indicating non-significance, which is a desirable result. The result of the Ljung-Box test on the residuals for the estimated univariate models therefore favour accepting the models as effective models for the persistence in the livebirths series.

Figure 4: (a) acf of residuals and (b) p-values for Q-statistic for ARIMA(1,1,0) model.

Fig. 5: (c) acf of residuals and (d) p-values for Q-statistic for SARIMA(1,1,0)(0,0,1) model.

Page 11: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

44

Table 3: Actual and Predicted number of Livebirths in Nigeria for period 2005-2007

Year Actual livebirths

Figures

Predicted cases from ARIMA

model

Lower limit Upper limit Predicted cases from

SARIMA model

Lower limit Upper limit

2005

2005

2005

2005

166882.28

168043.07

170364.64

173847.01

166734.03

169893.11

171396.95

174364.91

114222.13

116386.28

117416.50

119449.72

235608.16

240072.19

242197.23

246391.19

177932.06

163786.62

173171.50

177135.37

123271.17

113471.23

119973.06

122719.23

249083.37

229281.47

242419.17

247968.10

2006

2006

2006

2006

154566.01

185486.81

247328.40

340090.79

178518.51

147994.35

206814.54

290923.04

122295.16

101384.41

141679.52

199298.55

252260.55

209127.54

292245.04

411096.89

182097.46

145132.06

207146.65

292038.57

126156.96

100547.36

143511.01

202324.07

254914.43

203167.33

289980.26

408818.70

2007

2007

2007

2007

560020.28

516714.67

430103.45

300186.61

406333.50

733234.72

504146.80

398262.70

278361.17

502306.79

345368.75

272832.22

574180.86

1036117.7

712398.67

562776.19

433754.17

660201.45

464019.10

369996.60

300504.51

457386.98

321472.02

256333.32

607203.40

924202.20

649570.63

517950.51

Fig. 6: Forecasted livebirths for the ARIMA and the SARIMA models, with the actual Livebirths

The univariate models are used as predictive models for making forecasts for future values of livebirths in Nigeria. The results of forecasting for the period 2005-2007 with the univariate models are given in

Table 3. In order to assess the prediction accuracy, the forecast mean square error (FMSE), forecast mean absolute error (FMAE), and forecast mean percentage error (FMAPE) were employed. The

20

05

20

06

20

07

0

100000

200000

300000

400000

500000

600000

700000

800000

Live

bir

ths

Year

actual

forecastARIMA

forecastSARIMA

Page 12: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

45

FMSE for the SARIMA forecasts is lower than forecasts generated by the ARIMA model. Figure 6 is the plot of the actual livebirths and the forecasts for both the ARIMA and SARIMA models, for period 2005-2007. The SARIMA model provides better forecasts, tending to outperform the ARIMA model forecasts before the peak in the registered livebirths of the 1

st quarter of 2007. For the remaining part of

2007, the ARIMA model forecasts seem superior. Generally, the forecasts at the turning point are poor for the two models. The analysis of the two univariate time series models suggests that both models are satisfactory, but SARIMA model outperformed the ARIMA model.

CONCLUSION: Modelling and prediction of Nigeria

total livebirths are the focus of this study. Univariate

stochastic time series models were developed and

fitted to the quarterly disaggregated total livebirths

series to predict future livebirths in Nigeria based on

past data. The proposed univariate models were

ARIMA(1,1,0) and SARIMA(1,1,0)(0,0,1)4. It was

discovered that the inclusion of the seasonal MA

improved the effectiveness of the model in predicting

future total livebirths in Nigeria. The values of the

measures of forecast performance considered

showed that the models fit the livebirths series well

and are statistically significant. Since the acf’s of the

residuals show that none of the autocorrelations for

lags 1-15 exceed the significance bounds, and the p-

values for the Ljung-Box test statistic are all well

above 0.05, it can be concluded that there is very

little evidence for non-zero autocorrelations in the

errors at lags 1-15. The SARIMA model was chosen

as the more adequate model in predicting future

livebirths because of its good prediction performance

and lower values of AIC and BIC, compared with the

ARIMA model. It is necessary to forecast future

livebirths of a population like Nigeria in order to

determine forecast demands of this demographic

phenomenon on our various systems such as

education, health, economy, and provision of social

amenities for future population over a period of time.

The Nigeria birth rate though has declined, but the

total livebirths is on the increase as revealed by the

result of this study.

Data from vital registration on live births are likely to be inadequate in a developing nation like Nigeria, due to poor record keeping culture. A critical

assessment of the data suggests an under-reporting of live births in 1994 in most Nigerian states. The data on livebirths for 1994 was therefore discarded, so that the study is based on total live births data from 1995-2007; which are more reliable. Despite the possibility of under-reporting of total livebirths in Nigeria, it is believed that under-reporting will not substantially alter the proposed time series models and the forecast of future livebirths in the country.

There is need to exercise caution in interpreting the result of the analysis due to the fact that recent changes in fertility behaviour of Nigerians are not accounted for in this study; and this could make a difference when considered. An empirical question to be investigated in future research is the possiblity to obtain improved forecasts of total births by considering stochastic model that incorporates other factors that have implications on births in Nigeria.

REFERENCES

[1] Anderson, T.W. The statistical analysis of Time Series. New-York: John Wiley, 1971.

[2] Akaike, H. Markovian representation of stochastic processes and its application to the analysis of autoregressive moving average processes. Ann. Inst. Statist. Math. 1974; 26: 363-387.

[3] Boot, J.C.G., Feibes, W., and Lisman, J.H.C. Further methods of derivation of quarterly figuresfrom annual data. Journal of Royal Statistical Society 1967; Series C. 16(1): 65-75.

[4] Booth, H. And Tickle, L. Mortality modelling and forecasting: a review of methods. ADSRI working Paper, 2008; no. 3.

[5] Box, G.E.P., and Jenkins, G.M. Time series Analysis: Forecast and Control 3

rd Ed. San-

Francisco: Holden-Day, 1994.

[6] Brockwell, P. and Davis, R. Time Series: Theory and Methods, 2

nd Ed. Spinger-Verlag, 1991.

[7] Chatfield, C. Time series forecasting. Chapman and Hall, Inc., 2001.

[8] Chatfield, C. The analysis of time series, an introduction, 4

th Ed. London: Chapman and Hall,

1989.

[9] Chatfield, C. The analysis of time series, an introduction, 6

th ed. New-York, Chapman &

Hall/CRC, 2004.

[10] Chik, Z. Performance of order selection criteria for short time series. Pakistan Journal of AppliedSciences 2002; 2(7): 783-788.

[11] Cryer, JD. Time series analysis. Boston: Duxbury Press, 1985.

Page 13: Stochastic models of Nigerian total livebirths...Stochastic models of Nigerian total livebirths Adekanmbi, D.B Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso,

Am. J. Sci. Ind. Res., 2017, 8(3):34-46

46

[12] Dickey, D.A., and Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 1979; 74: 427-431.

[13] Diggle, P. J. Time series, a biostatistical introduction. Oxford: Clarendon Press, 1990.

[14] Fuller WA. Introduction to statistical time series. New-York: John Wiley, 1976.

[15] Granger, C.W.J. and Newbold, P. Forecasting Economic Time Series. New-York: Academic Press, 1977.

[16] Hannan, E.J. Multiple Time Series. New-York: John Wiley, 1970.

[17] Hurvich, C.M, and Tsai, C.L. Regression and Time Series model selection in small samples, Biometrika 1989; 76(2): 297-307.

[18] Jenkins, G. M. Practical experiences with modeling and forecasting time series. Kendal:Titus Wilson Ltd, 1979.

[19] Ljung, L., System identification toolbox for use with MATLAB

®, User’s guide. The MathWorks

Inc., 24 Prime Park Way, Natick, Mass. 01760, 1995.

[20] Kendall, M. and Ord, J.K.. Time series (3rd

ed.). New-York : Edward Arnold, 1993.

[21] McDonald, J. A time series approach to forecast Australian total livebirths. Demography 1979; 16: 575-602.

[22] -------------. Modelling demographic relationships: an analysis of forecast functions for Australian births. Journal of the American Statistical Association 1981; 76: 782-792.

[23] Mills, T.C. Time series techniques for Economists. Cambridge: Cambridge University Press, 1990.

[24] National Population Commission [Nigeria]. Report on livebirths, deaths and stillbirths registration in Nigeria, (1994-2007). Claverton, Maryland: National Population Commission and ORC/Macro, 2008.

[25] Saboia, J.L.M. Autoreggressive integrated moving average (ARIMA) models for birth forecasting. Journal of the American Statistical Association 1977; 72: 264-270.

[26] Shryock, HS., Siegel, JS., and Associates. The methods and materials of demography, (Condensed ed.). London: Academic Press Inc. Ltd, 1976.

[27] Schwartz, G. Estimating the dimensions of a model. Ann. Statist. 1978; 6: 461-464.

[28] Tiao, G.C and Box, G.E.P. Modelling multiple time series with applications. Journal of the American Statistical Association 1981; 76 (376): 802-16.

[29] U.S. National Centre for Health Statistics, Physician Handbook on Medical certification: death, fetal death, birth. Public Health Service Publication 1967; Series B (593).

[30] UNICEF.Frequency of Births in Africa. www.unicef.org/infobycountry/nigeria_statistics.htl, 2014.

[31] USAID. USAID country health statistical report; Nigeria, 2011. (Website pls)

[32] Research in view. Early childbearing in Nigeria: a continuing challenge. www.guttmacher.org/ pubs/ribs/ 2004/12/10/rib2-04.pdf. New-York: The Allan Guttmacher Institute, 2004; Series 2.