Investment in real time and high definition: A big data ...
Transcript of Investment in real time and high definition: A big data ...
Investment in “Real Time” and“High Definition”: A Big Data Approach
November 2020
Modelling with Big Data & Machine Learning: Measuring Economic InstabilityThe Bank of England, The Federal Reserve Board and King's College London
Ali.B. Barlas, Seda Güler, Alvaro Ortiz & Tomasa Rodrigo
2Investment in ”Real Time” & ”High Definition”: A Big Data approach
Why this paper is relevant
- The first “Real Time” Investment indicator from a Bank’s Big Data
- Improves the properties of a Standard Nowcasting Model
- It’s not only “Real Time” but also “High Definition” (Sector & Geography)
- Validation with national accounts and proxies is successful
- Validation results are promising for other Countries
3Investment in ”Real Time” & ”High Definition”: A Big Data approach
The Covid-19 crisis has reinforced the role of Big Data tools forEconomic Analysis
The high uncertainty triggered by the Covid-19 crisis has stressed the need to monitor the evolution ofthe economy in “real time”. These efforts have been materialized in several ways:
- Focusing on timely, alternative indicators: soft data surveys (particularly the PurchasingManager Indexes, PMIs) and other high frequency indicators like electricity production or chain
store sales released on daily or weekly basis.
- Developing higher frequency models: Some CBs have relied on this High Frequency indicatorsto develop weekly activity tracker models such as the FED´WEI (Lewis, 2020) and the
Bundesbank WAI (Eraslan, S. and T. Götz, 2020).
- Developing New Big Data Indicators*: Focusing on daily aggregate information of bankingtransactions to track consumption, employment , turnover, mobility… … link.
* Some of the Recent literature on Big Data analysis Andersen, Hansen, Johannesen, & Sheridan (2020a), Andersen, Hansen, Johannesen, & Sheridan (2020b), Alexander & Karger (2020), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020a), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020b), Bounie, Camara, & Galbraith (2020), Chetty, Friedman, Hendren, & Stepner (2020), Chronopoulos, Lukas, & Wilson (2020), Cox, Ganong, Noel, Vavra, Wong, Farrell, & Greig (2020), Surico, Kanzig, & Hacioglu (2020).
4Investment in ”Real Time” & ”High Definition”: A Big Data approach
Efforts on Big Data Information have been accelerated due to theimportance to know an updated state of the economy…
5Investment in ”Real Time” & ”High Definition”: A Big Data approach
We continue to enhance our “Real Time & High Definition” extending our Big Data indexes to Investment (GFCF)
The investment spending is done mostly by companies and, to a lesser extent, by individuals
We track investment payments through
Firms are classified by their NACE codes to identify their business activity (in line with the European statistical classification of sectors)
firm to firm transactionsindividual to firm transactions
Construction Investment
We approximate investment demand in one type of asset taking into account the aggregate flows or transactions done from any firm or individual to the sector which produce the fixed assets
Machinery Investment*Total Investment*Machinery & Equipment, Media & ICT, Agriculture & Animals, Forestry, Durable Goods, Retail Trade, Textile & Clothing, Transportation and Shipping.
+
6Investment in ”Real Time” & ”High Definition”: A Big Data approach
The data: The Big Data Investment Data Representativeness
BBVA Big Data
Transactions(000s)
Firms(000s)
INVESTMEMT DATA 2019: BBVA vs COMPANY ACCOUNTS (CENTRAL BANK & TURKSTAT)
Source: cbrt & turkstat and own calculations
Turkey CBRT
179.7
Total ConstructionMachinery
23.2 156.5
24.6 2.3 22.3
308Amount(US$ bn)
Total ConstructionMachinery
440 257 183
730.2 614.4
280 28
115.8
Firms(% CBRT) 24.6% 25.5% 19.8%
7Investment in ”Real Time” & ”High Definition”: A Big Data approach
A caveat: we develop Real Time & High Definition indicators … wedo not fully replicate the national accounts
Gross Fixed Capital Investments
(ESA, Yearly)
Gross Fixed Capital Investments(QNA, Quarterly)
InvestmentProxies
(Proxies, Monthly)
Big Data Index
(Proxies, Daily)
Balanced(P=E=I, One GDP figureReal Vs N)
Net Disposal ofFixed assets
SampleRepresentativity
Estimations Supply, Demand, Suveys…
Seasonality
Frequent Revisions(short samples)
Balanced(P=E=I, One GDP figureReal Vs N, SA and NSA)
National Accounts Indicators
Stability(short samples)
InvestmentProxies
(Proxies, Monthly)
Gross FixedCapital
Investments(QNA, Quarterly)
CrossValidation
Big Data Investment in Real time and High Definition: Thecase of Turkey
9Investment in ”Real Time” & ”High Definition”: A Big Data approach
Big Data Investment in detail: The case of Turkey
- The volatility of investment in Turkey and the delay in the publication of official data
(standard in EMs) make it an ideal case for analysis
- We compute investment indicators in real time (daily) for 22 sectors and 81 provinces
- Validation results are positive in terms of the main Aggregates (Machinery & Transport
and Construction) as well as the aggregate investment
- A deeper analysis show how the Big Data investment can improve the properties of an
standard Nowcasting Model in terms of Accuracy, Anticipation and News contribution
- The high definition analysis show the characteristics of the latest investment shocks and
the Geography response of Investment to the Covid-19 crisis
10Investment in ”Real Time” & ”High Definition”: A Big Data approach
Validation I: The Big Data investment index shows a high correlationan co-movement with the oficial data TURKEY: GBBBVA BIG DATA INVESTMENT INDICES (28-day cum. YoY real )
Correlation coefficient: 0.88
Total Investment
Correlation coefficient: 0.84 Correlation coefficient: 0.77
Machinery & Equipment Construction
Source: Own elaboration
-25%
-15%
-5%
5%
15%
25%
35%
Jun-
15Se
p-15
Dec
-15
Mar
-16
Jun-
16Se
p-16
Dec
-16
Mar
-17
Jun-
17Se
p-17
Dec
-17
Mar
-18
Jun-
18Se
p-18
Dec
-18
Mar
-19
Jun-
19Se
p-19
Dec
-19
Mar
-20
Jun-
20Se
p-20
Official GCFC Machinery InvestmentBig Data Machinery Investment
-45%
-35%
-25%
-15%
-5%
5%
15%
25%
Jun-
15Se
p-15
Dec
-15
Mar
-16
Jun-
16Se
p-16
Dec
-16
Mar
-17
Jun-
17Se
p-17
Dec
-17
Mar
-18
Jun-
18Se
p-18
Dec
-18
Mar
-19
Jun-
19Se
p-19
Dec
-19
Mar
-20
Jun-
20Se
p-20
Official GCFC Construction InvestmentBig Data Construction Index
-30%-25%-20%-15%-10%
-5%0%5%
10%15%20%
Jun-
15Se
p-15
Dec
-15
Mar
-16
Jun-
16Se
p-16
Dec
-16
Mar
-17
Jun-
17Se
p-17
Dec
-17
Mar
-18
Jun-
18Se
p-18
Dec
-18
Mar
-19
Jun-
19Se
p-19
Dec
-19
Mar
-20
Jun-
20Se
p-20
Official GCFC InvestmentBig Data Total Investment
11Investment in ”Real Time” & ”High Definition”: A Big Data approach
Validation II: The sincrony of Big Data Investment with the Investment
Cycle is validated by high correlation coefficients with HF proxies
BBVA BIG DATA INVESTMENT & HIGH FREQUENCY PROXIES (28-day cum. YoY real )
Maquinery & Equipment ConstructionCorrelation coefficient: 0.75
Correlation coefficient: 0.79
Correlation coefficient: 0.85
Correlation coefficient: 0.72
Correlation coefficient: 0.90
Source: Own elaboration and Turkstat
-40%
-30%
-20%
-10%
0%
10%
20%
30%
40%
50%
-50%
-40%
-30%
-20%
-10%
0%
10%
20%
30%
Mar
-17
Jun-
17
Sep-
17
Dec
-17
Mar
-18
Jun-
18
Sep-
18
Dec
-18
Mar
-19
Jun-
19
Sep-
19
Dec
-19
Mar
-20
Jun-
20
Sep-
20
Dec
-20
Garanti BBVA Big Data Construction Index 3mEmployment in ConstructionNon-Metalic Mineral (Cement)Turnover Building - rhs
-15-13-11-9-7-5-3-1135
-25%-20%-15%-10%
-5%0%5%
10%15%20%25%
Mar
-17
Jun-
17
Sep-
17
Dec
-17
Mar
-18
Jun-
18
Sep-
18
Dec
-18
Mar
-19
Jun-
19
Sep-
19
Dec
-19
Mar
-20
Jun-
20
Sep-
20
Dec
-20
Garanti BBVA Big Data Total M&T InvestmentIndustrial Production MachineryCapacity Utilization Rate Investment Goods - yoy, rhs
12Investment in ”Real Time” & ”High Definition”: A Big Data approach
Big Data & Nowcasting Model (DFM): The framework
9ECB
Working Paper Series No 1189May 2010
the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.
Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.
Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).
and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.
The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.
2 Econometric framework
Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised
to mean 0 and unit variance. We assume that yt admits the following factor model representation:
yt = Λft + εt , (1)
where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic
component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft
is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.
Further, it is assumed that the common factors ft follow a stationary VAR process of order p:
ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)
where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor
model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.
9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).
Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components
9
ECBWorking Paper Series No 1189
May 2010
the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.
Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.
Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).
and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.
The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.
2 Econometric framework
Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised
to mean 0 and unit variance. We assume that yt admits the following factor model representation:
yt = Λft + εt , (1)
where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic
component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft
is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.
Further, it is assumed that the common factors ft follow a stationary VAR process of order p:
ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)
where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor
model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.
9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).
Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components
10
ECBWorking Paper Series No 1189May 2010
2.1 Estimation
As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10
In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).
To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:
1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):
L(θ, θ(j)) = Eθ(j)
[l(Y, F ; θ)|ΩT
];
2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:
θ(j + 1) = arg maxθ
L(θ, θ(j)) . (3)
Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.
In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:
εt ∼ i.i.d. N (0, R) , (4)
where R is a diagonal matrix. In that case θ = Λ, A,R,Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation
and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.
10ECBWorking Paper Series No 1189May 2010
2.1 Estimation
As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10
In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).
To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:
1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):
L(θ, θ(j)) = Eθ(j)
[l(Y, F ; θ)|ΩT
];
2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:
θ(j + 1) = arg maxθ
L(θ, θ(j)) . (3)
Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.
In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:
εt ∼ i.i.d. N (0, R) , (4)
where R is a diagonal matrix. In that case θ = Λ, A, R, Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation
and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.
ExpectationMaximization (EM)
Algorithm
9ECB
Working Paper Series No 1189May 2010
the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.
Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.
Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).
and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.
The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.
2 Econometric framework
Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised
to mean 0 and unit variance. We assume that yt admits the following factor model representation:
yt = Λft + εt , (1)
where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic
component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft
is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.
Further, it is assumed that the common factors ft follow a stationary VAR process of order p:
ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)
where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor
model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.
9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).
Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components
NowcastingAccuracy
NowcastingAnticipation
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
A Dynamic factor Model(DFM)
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
11ECB
Working Paper Series No 1189May 2010
following expressions for Λ(j + 1) and A(j + 1):12
Λ(j + 1) =
(T∑
t=1
Eθ(j)
[ytf
′t |ΩT
])(
T∑
t=1
Eθ(j)
[ftf
′t |ΩT
])−1
, (5)
A(j + 1) =
(T∑
t=1
Eθ(j)
[ftf
′t−1|ΩT
])(
T∑
t=1
Eθ(j)
[ft−1f
′t−1|ΩT
])−1
. (6)
Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.
The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13
R(j + 1) = diag
(1T
T∑
t=1
Eθ(j)
[(yt − Λ(j + 1)ft
)(yt − Λ(j + 1)ft
)′|ΩT
])(7)
= diag
(1T
(T∑
t=1
Eθ(j)
[yty
′t|ΩT
]− Λ(j + 1)
T∑
t=1
Eθ(j)
[fty
′t|ΩT
]))
and
Q(j + 1) =1T
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]− A(j + 1)
T∑
t=1
Eθ(j)
[ft−1f
′t |ΩT
])
. (8)
When yt does not contain missing data, we have that
Eθ(j) [yty′t|ΩT ] = yty
′t and Eθ(j) [ytf
′t |ΩT ] = ytEθ(j) [f ′
t |ΩT ] . (9)
Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)
[ft−1f ′
t−1|ΩT
]
and Eθ(j)
[ftf ′
t−1|ΩT
], can be obtained through the Kalman smoother for the state space representation:
yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,
ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)
see Watson and Engle (1983).
However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as
vec(Λ(j + 1)
)=
(T∑
t=1
Eθ(j)
[ftf
′t |ΩT
]⊗ Wt
)−1
vec
(T∑
t=1
WtytEθ(j)
[f ′
t |ΩT
])
. (11)
Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes
R(j + 1) = diag
(1T
T∑
t=1
(Wtyty
′tW
′t − WtytEθ(j)
[f ′
t |ΩT
]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)
[ft|ΩT
]y′
tWt
+ WtΛ(j + 1)Eθ(j)
[ftf
′t |ΩT
]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)
)). (12)
12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).
13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.
Source: Banbura M., and Modugno, M (2010)
Outcomes
News Contribuion
13Investment in ”Real Time” & ”High Definition”: A Big Data approach
Jan Feb Mar Apr May Jun July Aug SeptGDP
Industrial Production
Non-metal Mineral Production
Auto Sales
Number of Employed
Number of Unemployed
Auto Imports
Auto Exports
Electricity Production
Manufacturing PMI
Real Sector Confidence
Total Loans growth 13-week
Bog Data Total Consumption Indicator
Bog Data Total Investment Indicator
2020
TURKEY: VARIABLES IN MONTHLY GDP DFM
Har
dD
ata
(M &
D)
Soft
(M)
Fin
(W)
Big
Dat
a (D
)Source: Own elaboration
Big Data & Nowcasting Model (DFM): Variables & Releases
TURKEY: VARIABLES IN MONTHLY INVESTMENT DFM
14Investment in ”Real Time” & ”High Definition”: A Big Data approach
TURKEY: OUT-OF-SAMPLE ERROR GDP MODEL
Source: Own Elaboration
-4%
-2%
0%
2%
4%
6%
8%
10%
12%
14%
Mar
-16
Jun-
16
Sep
-16
Dec
-16
Mar
-17
Jun-
17
Sep
-17
Dec
-17
Mar
-18
Jun-
18
Sep
-18
Dec
-18
Mar
-19
Jun-
19
Sep
-19
Dec
-19
Mar
-20
TURKSTAT GDP
Standard Dynamic Factor Model (DFM)
DFM + Big Data Investment
-25%
-20%
-15%
-10%
-5%
0%
5%
10%
15%
20%
25%
Mar
-16
Jun-
16
Sep
-16
Dec
-16
Mar
-17
Jun-
17
Sep
-17
Dec
-17
Mar
-18
Jun-
18
Sep
-18
Dec
-18
Mar
-19
Jun-
19
Sep
-19
Dec
-19
Mar
-20
TURKSTAT Investment Expenditures
Standard Dynamic Factor Model (DFM)
DFM + BBVA Investment Index
TURKEY: OUT-OF-SAMPLE ERROR INVESTMENT MODEL
Big Data & Nowcasting Model (DFM): Out-of-Sample errors
15Investment in ”Real Time” & ”High Definition”: A Big Data approach
TURKEY: NOWCASTING FINANCIAL CRISIS (SEPT 2018) & COVID CRISIS (MAR 2020)(quasi real time nowcasting with and without Big Data Indexes vs Benchmark)**
Financial Crisis (September 2018) COVID-19 (March 2020)
The model with Big Data investmentStart to signal the deceeleration morethan a moth before
The model with Big Data Consumption Reactedinmediately signalling theDeep and the Sharp recovery
Big Data & Nowcasting Model (DFM): Anticipation
Source: Own Elaboration
16Investment in ”Real Time” & ”High Definition”: A Big Data approach
0.0
1.0
2.0
3.0
4.0
5.0
Indu
stria
l Pro
duct
ion
(M)
Big
Data
Inve
stnm
ent (
D)
Une
mpl
oym
ent /
d)
Car
Impo
rts (M
)
Car
Sal
es (M
)
PMI M
anuf
ac (M
)
Eect
ricity
Pro
duct
ion
(D)
Cem
ent P
rod
(M)
Cre
dit G
rowt
h (W
)
Empl
oym
ent (
M)
Big
Data
Con
sum
ptio
n (D
)
Busi
ness
Con
fiden
ce (M
)
Car
Exp
orts
(M)
M1/2 M2/3 M3/4 M4/5 Average
DFM News Contributions(Unbalanced Sample)
DFM News Contributions correcting Bias(Maintaining individually all the variables for the sample of Big data information)
Big Data & Nowcasting Model (DFM): News & Prevalence Bias
Source: Own Elaboration Source: Own Elaboration
17Investment in ”Real Time” & ”High Definition”: A Big Data approach
-60%
-40%
-20%
0%
20%
40%
60%
80%
8-M
ar
22-M
ar
5-Ap
r
19-A
pr
3-M
ay
17-M
ay
31-M
ay
14-J
un
28-J
un
12-J
ul
26-J
ul
9-Au
g
23-A
ug
6-Se
p
20-S
ep
4-O
ct
18-O
ct
Goods Services Total Consumption
One of the key results is that we can examine Investment in Real Time
TURKEY: BBVA BIG DATA CONSUMPTION & INVESTMENT(7-day cum. YoY nominal in Cons., 28-day cum. YoY nominal in Invest.)
BIG DATA CONSUMPTION BIG DATA INVESTMENT
-50%
-40%
-30%
-20%
-10%
0%
10%
20%
30%
40%
50%
22-F
eb3-
Mar
13-M
ar23
-Mar
2-Ap
r12
-Apr
22-A
pr2-
May
12-M
ay22
-May
1-Ju
n11
-Jun
21-J
un1-
Jul
11-J
ul21
-Jul
31-J
ul10
-Aug
20-A
ug30
-Aug
9-Se
p19
-Sep
29-S
ep9-
Oct
19-O
ct
Total Investment Machinery ConstructionSource: Own Elaboration
18Investment in ”Real Time” & ”High Definition”: A Big Data approach
TURKEY:GB-BBVA BIG DATA INVESTMENT HEAT MAP(3mm avg YoY nominal)
The "High Definition" sectorial information can help us to track the evolution of sectors in detail and differentiate different shocks…
Source: Own Elaboration
19Investment in ”Real Time” & ”High Definition”: A Big Data approach
TURKEY:GB-BBVA BiG DATA INVESTMENTS GEO-MAPS RECOVERY AFTER THE COVID-19(Change in YoY investment before, during and after the lockdowns by Covid)
… while geo localization of the big data investment will help us to analyze shocks at regiona levels to design targeted policies
Source: Own Elaboration
Investment in the rest of the countries: Validation
21Investment in ”Real Time” & ”High Definition”: A Big Data approach
Validation for Spain: Still working on raising Transaction Data
SPAIN: BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)
Correlation coefficient: 0.78 Correlation coefficient: 0.58 Correlation coefficient: 0.73
Source: Own Elaboration and INE
22Investment in ”Real Time” & ”High Definition”: A Big Data approach
MEXICO: BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)
Validation for Mexico: Positive results in Correlation & Comovement
Correlation coefficient: 0.86 Correlation coefficient: 0.78 Correlation coefficient: 0.88
Source: Own Elaboration and INEGI
23Investment in ”Real Time” & ”High Definition”: A Big Data approach
MEXICO: BBVA BIG DATA MACHINERY/CONSTRUCTION INVESTMENT INDEX AND PROXIES(YoY Real deflacted by Producer Prices, 3 months moving average)
Correlation coefficient: 0.78
Correlation coefficient: 0.62
Correlation coefficient: 0.88
Correlation coefficient: 0.78
Validation for Mexico: Positive results with HF proxies
Source: Own Elaboration and INEGI
24Investment in ”Real Time” & ”High Definition”: A Big Data approach
COLOMBIA:BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)
Correlation coefficient: 0.86 Correlation coefficient: 0.76 Correlation coefficient: 0.64
Validation for Colombia: Positive results in Correlation & Comovement
Source: Own Elaboration & DANE
ConclusionsFurther ResearchReferences
26Investment in ”Real Time” & ”High Definition”: A Big Data approach
Conclusions and Further Research
- Big Data will become an increasing tool for economic analysis as they provide”real time”and “high definition” advantages
- We provide a novel and validated “real time” and “high definition” assessment of theInvestment cycle for Turkey. Preliminary validation results show that this analysis can beextended to other countries
- Given the characteristic of Big Data information (HF, Granular but Short history) we need toexplore the integration of the Big Data information in new models. Particularly RegularizationTechniques to exploit high and rich dimension of the information (Large P) but in shortersamples (T) could enhance further the potential of Big Data for economic analysis
- The Big Data information can enhance the standard Nowcasting Models providing analystand policymaker with an effective tool to react faster to shocks and design targeted policies
27Investment in ”Real Time” & ”High Definition”: A Big Data approach
References
• Chetty, R., Friedman, J. N., Hendren, N., & Stepner, M. (2020). How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time
Economic Tracker Based on Private Sector Data. Working Paper.
• Chronopoulos, D. K., Lukas, M., & Wilson, J. O. S. (2020). Consumer Spending
Responses to the COVID-19 Pandemic: An Assessment of Great Britain. WorkingPaper 20-012.
• Carvalho, V,Hansen,S,Ortiz,A, Garcia,JR, Rodrigo,T, Rodriguez Mora,S P,Ruiz
(2020). Tracking the Covid-19 crisis with high resolution transaction data”. CEPR DP
No 14642
• Cox, N., Ganong, P., Noel, P., Vavra, J., Wong, A., Farrell, D., & Greig, F. (2020).Initial Impacts of the Pandemic on Consumer Behavior: Evidence from Linked Income,
Spending, and Saving Data.
• Eurostat (2013) Handbook on quarterly national accounts
• Gezici F & Burçin Yazgı Walsh & Sinem Metin Kacar, 2017. "Regional and structural
analysis of the manufacturing industry in Turkey," The Annals of Regional Science,Springer;Western Regional Science Association, vol. 59(1), pages 209-230, July.
• Modugno, M., B. Soybilgen, and E. Yazgan (2016). Nowcasting Turkish GDP and
News Decomposition. International Journal of Forecasting 32 (4), 1369–1384.
• Surico, P., Kanzig, D., & Hacioglu, S. (2020). Consumption in the Time of COVID-19:Evidence from UK Transaction Data. CEPR Discussion Paper 14733.
• Watanabe, T., Omori, Y., et al. (2020). Online consumption during the Covid-19 crisis:Evidence from japan. Tech. rep., University of Tokyo, Graduate School of Economics.
• Akcigit,U & Yusuf Emre Akgunduz & Seyit Mumin Cilasun & Elif Ozcan Tok & Fatih Yilmaz, 2019. "Facts on Business Dynamism in Turkey," Working Papers 1930, Research and
Monetary Policy Department, Central Bank of the Republic of Turkey.
• Andersen, A., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020a). Consumer Responses to
the COVID-19 Crisis: Evidence from Bank Account Transaction Data. CEPR Discussion Paper14809.
• Andersen, A. L., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020b). Pandemic, Shutdown
and Consumer Spending: Lessons from Scandinavian Policy Responses to COVID-19.
arXiv:2005.04630 [econ, q-fin].
• Baker, S. R., Farrokhnia, R., Meyer, S., Pagel, M., & Yannelis, C. (2020a). How DoesHousehold Spending Respond to an Epidemic? Consumption During the 2020 COVID-19
Pandemic. Working Paper 26949, National Bureau of Economic Research.
• Baker, S. R., Farrokhnia, R. A., Meyer, S., Pagel, M., & Yannelis, C. (2020b). Income, Liquidity,and the Consumption Response to the 2020 Economic Stimulus Payments. Working Paper
27097, National Bureau of Economic Research.
• Banbura,M and Modugno (2014), maximum likelihood estimation of factor models on datasets
with arbitrary pattern of missing data, Journal of Applied Econometrics, 29, (1), 133-160
• Bodas, D., López, J. R. G., López, T. R., de Aguirre, P. R., Ulloa, C. A., Arias, J. M., de DiosRomero Palop, J., Lapaz, H. V., & Pacce, M. J. (2019). Measuring retail trade using card
transactional data. Working Papers 1921, Banco de España;Working Papers Homepage.
• Bounie, D., Camara, Y., & Galbraith, J. W. (2020). Consumers’ Mobility, Expenditure and
Online-Offline Substitution Response to COVID-19: Evidence from French Transaction Data.SSRN Scholarly Paper ID 3588373, Social Science Research Network, Rochester, NY.