Scholarship Project Paper 2019 Attention-based Deep ... · Chapter 1 Introduction 1 Research...
Transcript of Scholarship Project Paper 2019 Attention-based Deep ... · Chapter 1 Introduction 1 Research...
www.set.or.th/setresearch
Disclaimer: The views expressed in this working paper are those of the author(s) and do not necessarily represent the Capital Market Research Institute or the Stock Exchange of Thailand. Capital Market Research Institute Scholarship Papers are research in progress by the author(s) and are published to elicit comments and stimulate discussion.
Scholarship Project Paper 2019
Attention-based Deep Learning Model on Financial Big Data
Tanawat Chiewhawan Asst. Prof. Peerapon Vateekul, Ph.D.
Asst. Prof. Tanakorn Likitapitwat, Ph.D. Santhapon Sripilaopong
Chulalongkorn University
20 April 2020
Abstract
This study utilizes the deep learning model with attention mechanism on the historical stock data from SET SMART portal. The main objective for using this deep learning approach is to handle variety of the data as well as trying to draw insight from deep learning using the technique called the attention mechanism. We design deep learning task to predict tomorrow stock returns of 64 stocks within SET100 and ranking them among them to propose the best stock for investment each day. Experimental results show that our proposed model able to compete other baseline model on stock ranking and profit task. We also develop a proof-of-concept website using this deep learning model with attention mechanism for explainable insight on input features.
JEL Classification: … Keywords: Deep learning, Stock prediction, Stock ranking, Attention mechanism E-Mail Address: [email protected], [email protected]
2019 Capital Market Research Institute, The Stock Exchange of Thailand
Content Page
Chapter 1 Introduction 1
Research objectives 3
Expected outcomes 3
Chapter 2 Background knowledge 4
Artif icial neural network 4
Recurrent neural network and LSTM 5
Attention mechanism 5
Dual-stage attention 6
Chapter 3 Methodology 7
Deep learning task design 7
Proposed framework 7
Data pre-processing 8
Data scope 9
Target stock pre-selection 10
Chapter 4 Proposed model 11
Chapter 5 Performance evaluation 14
Evaluation metrics 14
Baselines 15
Hyper parameter tuning 15
Chapter 6 Experimental results 16
Model interpretabil i ty 17
Chapter 7 Deep-trade website 19
Chapter 8 Conclusion and discussion 22
Acknowledgement
References
23
24
2019 Capital Market Research Institute, The Stock Exchange of Thailand
Table
1. An intui t ive explanat ion for the accuracy and prof i t d iscrepancy 2
2. List of fundamental and pr ice informat ion inputs 8
3. List of technical indicators generated from histor ical data 9
4. Data set records summary 9
5. List of 64 target stocks with in th is research 10
6. Exper imental resul t for seven comparat ive models 16
Figure
1. Art i f ic ia l Neural Network 4
2. Long short term memory diagram 5
3. Attent ion mechanism 6
4. Dual-stage at tent ion recurrent neural network 6
5. Proposed framework 7
6. I l lustrat ion of the s l id ing-spl i t t ing per iods 10
7. A simpl i f ied diagram of dual -stage-at tent ion mechanism 11
8. Diagram of the proposed model DA-RANK 13
9. Performance chart for 2016 data 16
10. Performance chart for 2017 data 17
11. Performance chart for 2018 data 17
12. Attent ion mechanism in dual -stage at tent ion 18
13. Website: Port fo l io summary 19
14. Website: Dai ly predict ion 20
15. Website: StockInsight 21
16. Website: At tent ion explainable 21
2019 Capital Market Research Institute, The Stock Exchange of Thailand
Appendix
1. Recommended future works 23
2. Tools and resources 24
Chapter 1 Introduction
Stock prediction is notoriously a challenging subject because of the
high volati l i ty and the influence of dynamic external factors such as the global
economy and investor’s behavior. This topic of stock predictabil i ty has long been
controversial. The earl ier works on the eff icient-market hypothesis (EMH) [1,2]
suggest the price reflects all information suddenly, and the movement is random
processes. However, various studies from many fields attempt to explore this
challenge. Recently, machine learning and deep learning are two of them
emerging with promising results.
Machine learning has become popular in the stock prediction research
due to its performance and abil i ty to handle increasing information. For example,
the works from [3,4] conduct comparative experiments using multiple algorithms
such as Support Vector Machine (SVM), Random Forest (RF), and Artif icial
Neural Network (ANN). The results show that RF outperforms other baseline
models in the metrics of accuracy on stock trend classif ication as well as trading
profit. Recently, more modern approaches start uti l izing a deep learning model in
their studies. In references [5-7], implement a Long-Short term memory recurrent
neural network (LSTM) [8] with successful results. This LSTM is one of the most
widespread algorithms to processes time-series data. In [7] used the LSTM with
numerous features of generated technical indicators to prediction stock trends
successfully. While [9] explores a modif ied LSTM to enhance the model’s feature
extraction.
However, with a various selection of features available in the financial
market, the model becomes more challenging to converge for the solution.
2
Hence, recent deep learning researches aim to implement more
techniques to enhance models on those challenges. T. Holl is, S.E. Yi, and A.
Viscardi [10] investigate LSTM with an attention mechanism. Their results align
with other researches showing time-series forecast improvement [11,12]. This
paper wil l also uti l ize the attention mechanism to boost model prediction with
numerous features time-series.
In this work, we propose a framework for mult iple stocks prediction to
handle various stock features, and to capture stock relations. We modif ied the
Dual-Stage-attention model (DA-RNN), the original work of Qin [12], to tackle the
features and the temporal relations. Next, we transformed a set of stock features
into a fixed batch size and trained them with a shared parameter model concept.
This set up allows us to infer stock relations with a combination of loss
functions: regression loss and ranking loss. As first mentioned in [13], a model
with high accuracy does not always lead to the optimum profit when trading,
Table 1. Demonstrates this discrepancy. Thus, our framework can focus more on
profit using the relational ranking. Our prediction target is the next day returns
when the investment is made at the close price. Finally, we conducted the
experiments on 64 targets stock of SET. The results show that our model could
improve annualized returns over the baseline while maintaining regression
accuracy.
Table 1: An intuit ive explanation for the accuracy and profit discrepancy
Source: Feng, F. et a l . Temporal Relat ional Ranking for Stock Predict io
Ground Truth 1) Ranking-aware prediction 2) MSE optimized prediction
Stock A B C A B C MSE Profit A B C MSE Profit
Stock returns +30 +10 -50 +50 -10 -50 266 +30 +20 +30 -40 200 +10
3
Research objectives:
1. Introduce deep learning as a leading indicator and prediction
model for investors via an interactive website.
2. Optimize and research on deep learning predictive model for
Thailand market.
3. Comparative analysis of deep learning performance with baselines
such as tradit ional machine learning, buying market index trading.
Expected outcomes
1. Deliver example use of the deep learning for investor on website
with example predictive use case
2. Propose deep learning suitable for ranking and prediction multiple
Thai stock returns
3. Provide comparative analysis of the deep learning model with other
baseline models
Chapter 2 Background Knowledge
Artificial Neural Network Artif icial Neural Networks (ANN) is one of the most popular methods
within the Machine Learning area. The model consists of interconnected computational units called neurons that simulating a more straightforward structure of neurons present in the human brain. Those neurons are constructed into groups of the layer including an input layer, output layer and hidden layer(s) in between input and output layer. The neurons in each layer calculate their inputs before feed outputs to the next subsequent layer. Then finally, the output layer provides prediction, which can be, for example, classif ication or regression value. The calculating process in each neuron usually involves a matrix of constant weight and an activation function which allows the model to capture the nonlinearity and complexity of the problems. The structure in which there is no input-output cyclic between interconnected layers is called the Feed Forward Neural network (FNN). Whereas Deep learning Neural Network (DNN) is often referred to as an ANN with many hidden layers or a more complex neuron structure. The figure below il lustrates an example, feed-forward neural network model.
Figure 1:Art i f icial Neural Network
Source: An Insight to Soft Comput ing based Defect Predict ion Techniques in Software
5
Recurrent Neural Network and LSTM
Long Short-Term Memory Networks (LSTMs) is a particular type of Recurrent Neural Network which able to handle long period dependencies of its inputs. The models work tremendously well on various problems and have actively become more widespread.
LSTMs are explicitly designed to deal with long-term problems. The structure shares recurrent similarities with RNN but the core computation units contain special four interacting gates. These gates help to guide them on how to memorize and forget the inputs. The Figure 1 below illustrates the four gates mechanism in the LSTMs. First, the forget gate with the Sigmoid activation function decides which previous information in the memory cell should be forgotten. Then the next two gates decide what to memorize and what its memory magnitude or values utilizing both Sigmoid and the tanh activation function. Finally, the last gate combines LSTM input with cell memory to generate output for the current unit. Those outputs can be fed into the next LSTM layer along with memory cells to make the next prediction. The memory cell which passes through each LSTMs helps avoid vanishing gradient problem that could be observed in a vanilla RNN model.
Figure 2:Long short term memory diagram
Source: https://towardsdatascience.com/understanding-lstm-networks-by-example-using-torch-c63dba7bbb3c
Attention mechanism
This deep learning technique has shown excellent performance in the textual and visual area. Attention mechanism could help model focus on inputs easier using their softmax layer and attention weights. The examples are text translation and image caption generation. The work of [17] is shown in Figure 3 below. The attention mechanism could focus the input (black/white) for each prediction output. We aim to apply the attention mechanism in order to see what the deep learning models are focusing on
2019 Capital Market Research Institute, The Stock Exchange of Thailand
6
Figure 3: Attention mechanism
Source: Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. I
Dual-stage attention
Dual-stage attention [14] is the early adoption of attention mechanisms to financial market prediction tasks. The model predicts the future price of NASDAQ 100 index. The left Figure below shows prediction results without attention, with 1 layer of attention and 2 layers of attention. On the right, attention weight could visualize the model ability to differentiates actual inputs with random noises. This dual-stage attention model will be our core focus to develop attention base predictive model for Thailand market.
Figure 4: Dual-stage attention recurrent neural network
Source: Qin, Y., et al. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Chapter 3 Methodology
Deep learning task design:
As described in the proposal that the deep learning output is flexible to design. We could define output freely depending on our objectives. As our project progress, we designed our problem to be a practical multiple stock returns prediction of the following days, as shown in equation (1). 𝑦"# = (𝑝"'(# −𝑝"'+# )/𝑝"'+# (1)
Where 𝑦"# is the return ratio for the stock i at time step t and 𝑝"'+# is the close price of
the next day while Where 𝑦"# is the return ratio for the stock 𝑖at time step 𝑡and 𝑝"'+# is the
close price of the next day while 𝑝"'(# is the close price of the following day. This modification
from [13] provides more explicit expected profit for investors, since the 𝑝"# (today close price) was already in the past and could not be used as an actual trading (buying) price. Proposed framework:
Our proposed framework aims to improve the performance of multiple stock returns prediction using various time-series inputs. The framework, as shown in Figure 5, starts with data pre-processing and normalization. Next, we apply the fixed batch data transformation before moving on to the prediction model, and we will discuss the detail of this transformation later
Figure 5: Proposed framework
2019 Capital Market Research Institute, The Stock Exchange of Thailand
8
Data Pre-processing
1. Fundamentals and price data.
The fundamentals data for each stock are transformed with forward-filling if they are quarterly updated to be consistent with other daily frequency data. Also, there are seven fundamentals attributes that were represented in other forms, namely, in the percentage of changes from last quarter, the percentage of changes from last year, and cumulative value since the beginning of the year. Table 2 below describes details of all 52 attributes of fundamental and price data used in this study.
Table 2: List of fundamental and price information inputs
Attribute Name Description Count
A/P Turnover Seven attributes presented in 5 forms below
• Q - at the quarter data
• Cum. Q - cumulative quarter value since the first day of the year
• QoQ % - percent change from the previous quarter
• YoY % - percent change from the previous year, same quarter
• YoY Cum. - percent change from YE data (cumulative)
35
D/E Ratio
Fixed Asset
Shareholder Equity
Total Asset
Total Liability
Total Revenue
Attribute Name: Quarter data 6
Cash Cycle Period
Net Profit Margin Net Profit
Earnings per Share
Return of Asset Return of Equity
Attribute Name: Daily data 11
Close Price Open Price Stock Trade Volume
P/E Ratio
High Price Book Value Transaction Volume
P/BV Ratio
Low Price Market Value Market Capital
2. Technical indicators
We adopt a list of indicators proposed in [9] then generated them with our proposed period from short to long terms: (5, 7, 10, 14, 20, 30, 50, 75, 100) days. The total number of generated indicators is 9-periods multiply by 17 time-series from 15-indicators (MACD provide
2019 Capital Market Research Institute, The Stock Exchange of Thailand
9
three series) equals 153 features. Table 3 shows the full list of our technical indicator features. We applied this indicator generation for every single stock in our target stocks.
Table 3: List of technical indicators generated from historical data
RSI EMA TripleEMA MACD CMFI
William%R SMA CCI PPO DMI
WMA HMA CMO ROC PSI
Data Normalization
Standardization is applied to the input features because each of them has a different range of values. The z-score normalization formula is as follows.
z=(x-μ)/σ (2)
Where μ is the mean of the input x, and σ is the standard deviation of the input x.
The calculation of both σ and μ is within the validation and training dataset to avoid our model observation on the distribution of the testing dataset.
Data scope
The stock’s end of the day (EOD) data used in our research is from The Stock Exchange of Thailand (SET) Market, corresponding to the period from 12th February 2008 to 28th December 2018. We use daily frequency data due to computational and data access limitations. The total trading days during the studied period are 2655 days and are split into three sets, as summarized in Table 4. With 64 targets stocks in our scope, the total training, validating, and testing data for the model are approximately 92,000/31,500/15,000 records per period, respectively.
As suggested by [9,16], we split the data into training, validating, and testing period, as shown in Fig. 6. The split setting is to verify the robustness of the model over time.
Table 4: Data set records summary
No. Data period Training
records/stock Validating
records/ stock Testing
records/stock
1 Feb-2008 to Dec-2016 1437 509 222
2 Jan-2009 to Dec-2017 1464 487 243
3 Jan-2010 to Dec-2018 1464 488 244
2019 Capital Market Research Institute, The Stock Exchange of Thailand
10
Figure 6: I l lustration of the sl iding-split t ing periods
All data provided from SET SMART DATA protocol (Static data) (data granted with collaboration from Financial Laboratory, Chulalongkorn University)
Target stocks pre-selection
There are three considerations for our selection principles. 1. Stock information availability through training to testing periods
2. Sufficient liquidity to assume order always get filled
3. Sufficient volume and big market cap, to avoid price manipulation and to assume that
our trading effect on the price can be neglect
With the above criteria, we choose 64 target stocks out of the SET100 index listed below
Table 5: List of 64 target stocks within this research
BBL AP BH AMATA ERW TU QH CPALL
BJC HANA CENTEL RATCH KKP LPN BDMS AOT
SCC PTTEP INTUCH HMPRO KCE EGCO ADVANC WORK
TCAP PSL TVO PTT MINT CPN ROBINS TOP
KBANK SPALI TPIPL MAJOR UV IRPC BLAND BCH
SCB TRUE TASCO KTC LH DELTA GFPT SUPER
TMB BCP BTS THANI BANPU CK THAI GLOW
CPF ITD STA RS KTB SIRI STEC DTAC
The SET100 index includes top stock with high market capitalization and trading volumes. We select SET100 Thai market capital as of 7 February 2018. These 64 stocks started trading before 2008 and still active in 2018.
No. 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
1 Training Validating Test
2 Training Validating Test
3 Training Validating Test
Chapter 4 Proposed model
Our proposed model, Dual Attentional Ranking model (DA-RANK), aims to simultaneously predict sets of stock returns with relation inferences between them. The model structure consisted of two parts (i) Features relevance and temporal attention-al recurrent neural network (ii) Stock relation inference framework.
Features Relevance and Temporal Attention.
We select the state-of-the-art attention model for time series predictions called “Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN)” to enhance feature and temporal relevance. Our core deep learning network is a modification from the original work of Qin [12]. We added a batch normalization [15] layer before the Softmax layer in the input attention layer, as shown in Fig. 7 to enhance attention weights calculation. We omitted the full detail of DA-RNN is from our work as we introduce minor changes to the original work.
Figure 7: A simplif ied diagram of dual -stage-attention recurrent neural networks
Source: Qin, Y., et al. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Stock Relation Inference
The second part of our model structure purposes is to integrate stock relations during model training. We impose two methods (i) Fixed-batch training for a shared-parameter model and (ii) Pair-wise ranking loss.
1. Fixed-batch Training for a Shared-parameter Model. Developing a model per stock becomes tedious works for stock prediction task with a higher number of target assets because we need to tune them one by one for the optimized hyperparameters. Another method for multiple stocks prediction is to construct a single model
2019 Capital Market Research Institute, The Stock Exchange of Thailand
12
that treats all stocks collectively as combined features. Then the model can capture the relations between those features without having to calibrate the hyperparameters one at a time. Nonetheless, this single complex model involves an excessively higher number of trained parameters and in turn, becomes harder to converge. Alternatively, to capture stock relations while maintaining feasible parameters amount
We adopt the arrangement from [13] to train the shared parameter model and transforms the input data into a fixed-batch size equal to the number of target stocks. Fig. 8. shows the proposed fixed batch transformation. A slice of single stock’s features has a dimension of T x k, where k is the number of time-series features for each stock (e.g., technical indicators, financial parameter series), and T is the sliding window for those features. We prepare this slice for each stock within our target N stocks. This collection of N feature slices is size-equivalent to the training batch size and represents multiple stock information during the same period. All N stocks share the same weight in the modified DA-RNN model Fig. 8(b)as a result of our fixed batch size setting. These model’s shared weights are updated when the model observes all N slices of stock features in a batch during training.
There are three benefits to this design. First, it favors the ranking loss calculation, and
we will discuss this ranking loss in the next section. Second, the model becomes universal, with the ability to predict particular stock independently. To be more specific, the model treats individual stock as one separate set of features within a training batch. The trained model could still predict any stock without the need to retraining the whole model again when any stock ceases to trade in the market. Finally, it reduces the model weights per data by a factor of target stocks. The lower model weights imply faster training and more straightforward to converge for the solution.
2. Pair-wise Ranking Loss We use a combination of regression loss and ranking loss to optimize our model. On the regression part, the widely used mean square error loss (MSE) is selected for the model to
focus on the return prediction accuracy. This MSE loss calculation for stock 𝑖 is displayed in equation (3). Next, the pair-wise ranking loss is introduced to infer stock relations among all target stocks with their relative ranking score. The formula in equation (4) calculates the relative ranking error for every pair in the matrix. Finally, the combined loss for both functions in equation (5) is backpropagated to the model when learning a fixed batch size data Fig. 8(c).
2019 Capital Market Research Institute, The Stock Exchange of Thailand
13
𝑀𝑆𝐸𝑙𝑜𝑠𝑠# = 6𝑦7"# − 𝑦"#8( (1)
𝑃𝑎𝑖𝑟𝑤𝑖𝑠𝑒 − 𝑅𝑎𝑛𝑘𝑖𝑛𝑔𝑙𝑜𝑠𝑠 = ∑ ∑ max(0, −HIJK
H#JK (𝑦7"# − 𝑦7"
I)(𝑦"# − 𝑦"I)) (2)
𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑𝐿𝑜𝑠𝑠 = ∑ QRSTUVVWXWYZ
H+ 𝛼(𝑃𝑎𝑖𝑟𝑖𝑤𝑖𝑠𝑒 − 𝑅𝑎𝑛𝑘𝑖𝑛𝑔𝑙𝑜𝑠𝑠) (3)
Where N is the number of target stocks to be predicted simultaneously, the𝑦7"# is the
predicted return for stock𝑖 at time step t, the 𝑦"# is the label describes in the equation (1), and
𝛼 (alpha) is a weighting ratio tradeoff between the regression accuracy and the ranking accuracy, which is one of the hyperparameters to be tuned.
Figure 8: Diagram of the proposed model DA-RANK: (a) input features sl ices (b) a modif ied DA-RNN unit , the model’s weights are shared among al l stocks (c) combination of loss functionsA simplif ied diagram of dual -stage-attention recurrent neural network
Chapter 5 Performance evaluation
Evaluation metrics We compare the performance of each model with the following measures
1. Root mean square error (RMSE): Standard evaluation to the predicted return
2. Mean reciprocal ranking (MRR): Evaluate model on the ranking performance of the top stock (highest predicted return). The reason that MRR is calculated for top stock only is to align with our trading simulation, which we describe in the next paragraph.
3. Trading simulation:
The trading strategy we selected to evaluate model profitability is the daily buy-hold-sell strategy. We will invest only in the highest predicted return for each day, buy it on the close price of the next day, and sell it on the following. Also, every day, the amount of money for investment is the same (e.g., 1000 dollars daily). Detail for trading simulation as follows
• At the end of day t, we run the model to make return prediction for all target stock; the
output forecast implies profit for t+2 as described in equation (1).
• Before the end of day t+1, we pick the highest predicted return stock from day t
prediction. Set sell the buying condition to be at the close price of day t+1. (SET market
allows the investor to preset trading condition for the at-the-open price (ATO) or at-the-
close price (ATC) for each trading day)
• At the day t+2, we sell the stock bought from the previous day ATC price.
We neglect the Fee in the experiments. However, we can recalculate percent profit after fee
with Equation 4: Where t is the number of trading days
%𝑅𝑒𝑡𝑢𝑟𝑛_`"ab`aaccdb#ef"c_gV =h(×"×`aa'(+h`aa)×∑ %ja"dbeWWYk
WYl+'`aa
(4)
2019 Capital Market Research Institute, The Stock Exchange of Thailand
15
Baselines A comparative study will include these forecasting and trading models 1. Traditional trading
a. Buy /hold
i. SET Index: buy & hold for the SET market index
ii. SET100 Index: buy & hold SET100 Index
iii. SET64 Index: buy & hold for our 64 target stocks equally in investment (e.g.,
10,000 dollar per stock)
2. Financial Baseline
a. GARCH model was implemented based on reference [18]
i. To predict the volatility of each stock and trade them from the signal
produced
3. Machine learning
a. Artificial neural network: Basic 2 layer dense with linear output
b. Random Forest: To represent machine learning
4. Deep learning
a. LSTM: a1 Layer Vanilla LSTM model
b. DA-RANK: This research
Hyper parameter tuning
We optimized our model with the Adaptive Moment Estimation (Adam) algorithm with an initial learning rate of 0.001. Next, a grid search for hyperparameter was applied to the range of parameters as follows: hidden unit (16, 32), window size -T (5), and regression-ranking tradeoff:
Alpha - 𝜶 (0, 10, 100, 1000). We choose this alpha range because we observed that the average magnitude of the MSE loss on the first model epoch is around 95 times larger than the ranking-aware loss. Moreover, the batch size was fixed as 64 (equal to target stocks) to achieve the relations inference.
Chapter 6 Experimental results
We select the model with the best trading simulation profit in the validation dataset to be evaluated in the test dataset. Table 4 shows the test results from the model with the best validation. The results show that on the RMSE metric for regression return predictions, the RF consistently top performs over three years, while our model ranks the second. On the mean reciprocal ranking score (MRR), our model ranks the best in the year 2016 and 2017; however, the RF model outperforms our model on 2018 data. The table below shows results comparison with baseline models
Table 6: Experimental result for seven comparative models
Profit % Model 2016 2017 2018 Avg.
SET 20.00% 12.20% -12.10% 6.70% SET100 20.20% 14.90% -11.40% 7.90% SET64 (target) 25.80% 19.10% -16.00% 9.63% MRR top stock RMSE
GARCH -10.61% 0.92% 2.35% -2.44% 2016 2017 2018 Avg. 2016 2017 2018 Avg.
ANN 21.46% -14.35% 44.58% 17.23% 0.083 0.07 0.124 0.092 0.0346 0.0992 0.0221 0.05
RF 62.20% 49.20% 85.80% 65.73% 0.102 0.103 0.148 0.118 0.02 0.017 0.019 0.02
LSTM 53.20% -50.19% 38.39% 13.80% 0.088 0.068 0.097 0.084 0.629 0.464 0.0692 0.39
DA-RANK 112.28% 60.72% 41.01% 71.34% 0.137 0.105 0.132 0.125 0.043 0.023 0.022 0.03
Performance charts
Figure 9: Performance chart for 2016 data
2019 Capital Market Research Institute, The Stock Exchange of Thailand
17
Figure 10: Performance chart for 2017 data
Figure 11: Performance chart for 2018 data
Model Interpretability
One of the main reasons we are aiming to use the dual-stage attention mechanism model as a base deep learning model is because the ability to extract the relevance weights for each feature and temporal information through the softmax layer of the model. At the attention input layer, as highlighted in the red box in Figure 9 (a), all the input from time-series features is pass through and is multiply with the attention weight from softmax function. We are able to investigate this weight at every time step to observe what features are the most relevance for producing the prediction. Next, at the temporal attention layer, as highlighted in the red box in Figure 9 (b), the feature representations from the attention layer are feed into the temporal
2019 Capital Market Research Institute, The Stock Exchange of Thailand
18
layer one at a time for each time step. Likewise, the temporal attention weights are also applied to this encoded information before passing through for the LSTM prediction layer. With these two input attention and temporal attention, we are able to visualize some model interpretability to gain insight when its producing predictions.
Figure 12: Attention mechanism in dual-stage attention
Hosted in our proof of concept website is the results from our research, including the attention weights from our deep learning model.
Chapter 7 Deep-trade website URL: https://deeptrade.cu-set.com/ User : investor Password: cu-set Website features 1. Portfolio summary: an interactive module to illustrates the performance of each model
2. Daily prediction: a daily return prediction from our Dual-Rank model for all 64 target stocks.
We can also see the attention weight to find insight per each stock from relevance bar
charts
3. Stock Insight: Candlestick charts to provide a higher level for each stock as well as to
explore deep learning insight for any time step on the price charts.
Portfolio summary module
This module provides an interactive chart to study the performance of each model as well as a customize portfolio tester for starting investment amount and trading fee
Figure 13: Website: Portfol io summary
2019 Capital Market Research Institute, The Stock Exchange of Thailand
20
Daily prediction module
This daily prediction module is shown below 64 boxes of return prediction ranked by their expected returns. When clicking at the box, the temporal and features relevance are shown on top for user consideration. For example, PTTEP is predicted to provide the highest return and with bar chart ranked WILLR indicator 75 days period contributing the most to today's prediction.
Figure 14: Website: Daily prediction
Stock InSight
This module provides a familiar plot of the candlestick for the selected stock on the top right We converted the return prediction to a price prediction then plot the overlay on actual candlestick plots.
2019 Capital Market Research Institute, The Stock Exchange of Thailand
21
Figure 15: Website: StockInsight
When clicking at the price of a candlestick bar, the information below will provide attention weights for that selected day. For example, below, then click on Apr 16, 2018, the model shows that close price features contribute the most at this time step.
Figure 16: Website: Attention explainable
Chapter 8 Conclusion and discussion
On the topic of model performance We have found that our tuned DA-Rank model provides the highest average returns comparing to other methods with better ability to handle rich features inputs as well as stock ranking ability. Next, we also found that the RF model also consistently provides satisfying results we less effort and fewer resources to train. Interestingly the ANN and LSTM which are both a simpler version of the Neural Network model, made weak predictions, especially in the year 2017. We suspect that the model might be too over-fitted with the train data since the hyperparameter is fixed the same for all ANN, LSTM and DA-Rank model. Next for the GARCH model, with limited time, we follow the reference [18] to produce variance prediction and turn them into a trading signal for all 64 stocks. Unfortunately, it performs the poorest even comparing to buy and hold of the indexes. We suggest to re-implement the GARCH model for the ranking purpose to fairly compare with other models that trade only 1 stock per day. On the topic of model robustness We found that the more complex the model is the less robust the model becomes. For example, in the DA-RANK model, we found a high standard deviation of our result up to +- 48% annual profit. This might be a room for future works to develop a more robust framework On the topic of model interpretability Only two of the model in this research can provide model interpretability, RF model and our DA-RANK model. This model interpretability is also an important aspect to gain trust in the user of such a model. The RF model can provide insight from one of its decision trees constructed during training. However, it might be hard to understand the tree if the parameter of the RF is too complicated. On the other hand, over model provide a more intuitive comparison between features and temporal on bar charts.
2019 Capital Market Research Institute, The Stock Exchange of Thailand
23
Acknowledgement
We would like to express our appreciation to all the parties who support us during this research. Big thank you to all involved
• The Stock Exchange of Thailand (SET)
• Capital Market Research Institute (CMRI), all officers, and committees.
• Financial Laboratory, Department of Banking and Finance, Chulalongkorn Business School
o Asst. Prof. Tanakorn Likitapiwat, Ph.D.
• The Datamind Laboratory, Department of Computer Engineering, Chula
o Asst. Prof. Peerapon Vateekul, Ph.D.
o Tanawat Chiewhawan
o Sanathapon Sripilaopong
• GIPSIC Corporation Ltd. for website development
o Wisit Wongchaianukul
o Nutnicha Juntasri
Appendix Recommended future works
• Explore market microstructure and intraday possibility
• Integrated the model and framework with live data with future-forward test
• Study model robustness and avoid deep learning overfit
• Explore dimensionality reduction methods to improve model
• Textual features possibility
• Model explainability evaluation
2019 Capital Market Research Institute, The Stock Exchange of Thailand
24
Tools and resources
• GPU servers (Nvidia RTX 2080) support by CMRI budget
• Pytorch deep learning frameworks
• Python-based development
• GIPSIC Corporation Ltd. Website development
References 1. Fama, E.F.: Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance 25(2), 383-417 (1970). doi:10.2307/2325486 2. Malkiel, B.G.: Reflections on the efficient market hypothesis: 30 years later. Financial Review 40(1), 1-9 (2005). 3. Ballings, M., Van den Poel, D., Hespeels, N., Gryp, R.: Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications 42(20), 7046-7056 (2015). 4. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications 42(1), 259-268 (2015). 5. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research 270(2), 654-669 (2018). doi:10.1016/j.ejor.2017.11.054 6. Chen, K., Zhou, Y., Dai, F.: A LSTM-based method for stock returns prediction: A case study of China stock market. In: 2015 IEEE International Conference on Big Data (Big Data) 2015, pp. 2823-2824. IEEE 7. Nelson, D.M., Pereira, A.C., de Oliveira, R.A.: Stock market's price movement prediction with LSTM neural networks. In: 2017 International Joint Conference on Neural Networks (IJCNN) 2017, pp. 1419-1426. IEEE 8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput 9(8), 1735-1780 (1997). doi:DOI 10.1162/neco.1997.9.8.1735 9. Sezer, O.B., Ozbayoglu, A.M.: Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Applied Soft Computing 70, 525-538 (2018). doi:https://doi.org/10.1016/j.asoc.2018.04.024 10. Hollis, T., Viscardi, A., Yi, S.E.: A Comparison of LSTMs and Attention Mechanisms for Forecasting Financial Time Series. CoRR abs/1812.07699 (2018). 11. Guo, T., Lin, T.: Multi-variable LSTM neural network for autoregressive exogenous model. arXiv preprint arXiv:1806.06384 (2018). 12. Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. arXiv e-prints (2017). 13. Feng, F., He, X., Wang, X., Luo, C., Liu, Y., Chua, T.-S.: Temporal Relational Ranking for Stock Prediction. ACM Transactions on Information Systems 37, 1-30 (2019). doi:10.1145/3309547 14. Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 26-29 June 2016 2016, pp. 1-6
2019 Capital Market Research Institute, The Stock Exchange of Thailand
25
15. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015). 16. Oncharoen, P., Vateekul, P.: Deep Learning Using Risk-Reward Function for Stock Market Prediction. Paper presented at the 2018 2nd International Conference on Computer Science and Artificial Intelligence, Shenzhen,China. 17. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning 2015, pp. 2048-2057 18. GARCH model implementation: Access date 20 Dec 2019 https://medium.com/auquan/time-series-analysis-for-finance-arch-garch-models-822f87f1d755