Scholarship Project Paper 2019 Attention-based Deep ... · Chapter 1 Introduction 1 Research...

www.set.or.th/setresearch

Disclaimer: The views expressed in this working paper are those of the author(s) and do not necessarily represent the Capital Market Research Institute or the Stock Exchange of Thailand. Capital Market Research Institute Scholarship Papers are research in progress by the author(s) and are published to elicit comments and stimulate discussion.

Scholarship Project Paper 2019

Attention-based Deep Learning Model on Financial Big Data

Tanawat Chiewhawan Asst. Prof. Peerapon Vateekul, Ph.D.

Asst. Prof. Tanakorn Likitapitwat, Ph.D. Santhapon Sripilaopong

Chulalongkorn University

20 April 2020

Abstract

This study utilizes the deep learning model with attention mechanism on the historical stock data from SET SMART portal. The main objective for using this deep learning approach is to handle variety of the data as well as trying to draw insight from deep learning using the technique called the attention mechanism. We design deep learning task to predict tomorrow stock returns of 64 stocks within SET100 and ranking them among them to propose the best stock for investment each day. Experimental results show that our proposed model able to compete other baseline model on stock ranking and profit task. We also develop a proof-of-concept website using this deep learning model with attention mechanism for explainable insight on input features.

JEL Classification: … Keywords: Deep learning, Stock prediction, Stock ranking, Attention mechanism E-Mail Address: [email protected], [email protected]

2019 Capital Market Research Institute, The Stock Exchange of Thailand

Content Page

Chapter 1 Introduction 1

Research objectives 3

Expected outcomes 3

Chapter 2 Background knowledge 4

Artif icial neural network 4

Recurrent neural network and LSTM 5

Attention mechanism 5

Dual-stage attention 6

Chapter 3 Methodology 7

Deep learning task design 7

Proposed framework 7

Data pre-processing 8

Data scope 9

Target stock pre-selection 10

Chapter 4 Proposed model 11

Chapter 5 Performance evaluation 14

Evaluation metrics 14

Baselines 15

Hyper parameter tuning 15

Chapter 6 Experimental results 16

Model interpretabil i ty 17

Chapter 7 Deep-trade website 19

Chapter 8 Conclusion and discussion 22

Acknowledgement

References

23

24


Table

1. An intui t ive explanat ion for the accuracy and prof i t d iscrepancy 2

2. List of fundamental and pr ice informat ion inputs 8

3. List of technical indicators generated from histor ical data 9

4. Data set records summary 9

5. List of 64 target stocks with in th is research 10

6. Exper imental resul t for seven comparat ive models 16

Figure

1. Art i f ic ia l Neural Network 4

2. Long short term memory diagram 5

3. Attent ion mechanism 6

4. Dual-stage at tent ion recurrent neural network 6

5. Proposed framework 7

6. I l lustrat ion of the s l id ing-spl i t t ing per iods 10

7. A simpl i f ied diagram of dual -stage-at tent ion mechanism 11

8. Diagram of the proposed model DA-RANK 13

9. Performance chart for 2016 data 16



12. Attent ion mechanism in dual -stage at tent ion 18

13. Website: Port fo l io summary 19

14. Website: Dai ly predict ion 20

15. Website: StockInsight 21

16. Website: At tent ion explainable 21


Appendix

1. Recommended future works 23

2. Tools and resources 24

Chapter 1 Introduction

Stock prediction is notoriously a challenging subject because of the

high volati l i ty and the influence of dynamic external factors such as the global

economy and investor’s behavior. This topic of stock predictabil i ty has long been

controversial. The earl ier works on the eff icient-market hypothesis (EMH) [1,2]

suggest the price reflects all information suddenly, and the movement is random

processes. However, various studies from many fields attempt to explore this

challenge. Recently, machine learning and deep learning are two of them

emerging with promising results.

Machine learning has become popular in the stock prediction research

due to its performance and abil i ty to handle increasing information. For example,

the works from [3,4] conduct comparative experiments using multiple algorithms

such as Support Vector Machine (SVM), Random Forest (RF), and Artif icial

Neural Network (ANN). The results show that RF outperforms other baseline

models in the metrics of accuracy on stock trend classif ication as well as trading

profit. Recently, more modern approaches start uti l izing a deep learning model in

their studies. In references [5-7], implement a Long-Short term memory recurrent

neural network (LSTM) [8] with successful results. This LSTM is one of the most

widespread algorithms to processes time-series data. In [7] used the LSTM with

numerous features of generated technical indicators to prediction stock trends

successfully. While [9] explores a modif ied LSTM to enhance the model’s feature

extraction.

However, with a various selection of features available in the financial

market, the model becomes more challenging to converge for the solution.

2

Hence, recent deep learning researches aim to implement more

techniques to enhance models on those challenges. T. Holl is, S.E. Yi, and A.

Viscardi [10] investigate LSTM with an attention mechanism. Their results align

with other researches showing time-series forecast improvement [11,12]. This

paper wil l also uti l ize the attention mechanism to boost model prediction with

numerous features time-series.

In this work, we propose a framework for mult iple stocks prediction to

handle various stock features, and to capture stock relations. We modif ied the

Dual-Stage-attention model (DA-RNN), the original work of Qin [12], to tackle the

features and the temporal relations. Next, we transformed a set of stock features

into a fixed batch size and trained them with a shared parameter model concept.

This set up allows us to infer stock relations with a combination of loss

functions: regression loss and ranking loss. As first mentioned in [13], a model

with high accuracy does not always lead to the optimum profit when trading,

Table 1. Demonstrates this discrepancy. Thus, our framework can focus more on

profit using the relational ranking. Our prediction target is the next day returns

when the investment is made at the close price. Finally, we conducted the

experiments on 64 targets stock of SET. The results show that our model could

improve annualized returns over the baseline while maintaining regression

accuracy.

Table 1: An intuit ive explanation for the accuracy and profit discrepancy

Source: Feng, F. et a l . Temporal Relat ional Ranking for Stock Predict io

Ground Truth 1) Ranking-aware prediction 2) MSE optimized prediction

Stock A B C A B C MSE Profit A B C MSE Profit

Stock returns +30 +10 -50 +50 -10 -50 266 +30 +20 +30 -40 200 +10

3

Research objectives:

1. Introduce deep learning as a leading indicator and prediction

model for investors via an interactive website.

2. Optimize and research on deep learning predictive model for

Thailand market.

3. Comparative analysis of deep learning performance with baselines

such as tradit ional machine learning, buying market index trading.

Expected outcomes

1. Deliver example use of the deep learning for investor on website

with example predictive use case

2. Propose deep learning suitable for ranking and prediction multiple

Thai stock returns

3. Provide comparative analysis of the deep learning model with other

baseline models

Chapter 2 Background Knowledge

Artificial Neural Network Artif icial Neural Networks (ANN) is one of the most popular methods

within the Machine Learning area. The model consists of interconnected computational units called neurons that simulating a more straightforward structure of neurons present in the human brain. Those neurons are constructed into groups of the layer including an input layer, output layer and hidden layer(s) in between input and output layer. The neurons in each layer calculate their inputs before feed outputs to the next subsequent layer. Then finally, the output layer provides prediction, which can be, for example, classif ication or regression value. The calculating process in each neuron usually involves a matrix of constant weight and an activation function which allows the model to capture the nonlinearity and complexity of the problems. The structure in which there is no input-output cyclic between interconnected layers is called the Feed Forward Neural network (FNN). Whereas Deep learning Neural Network (DNN) is often referred to as an ANN with many hidden layers or a more complex neuron structure. The figure below il lustrates an example, feed-forward neural network model.

Figure 1:Art i f icial Neural Network

Source: An Insight to Soft Comput ing based Defect Predict ion Techniques in Software

5

Recurrent Neural Network and LSTM

Long Short-Term Memory Networks (LSTMs) is a particular type of Recurrent Neural Network which able to handle long period dependencies of its inputs. The models work tremendously well on various problems and have actively become more widespread.

LSTMs are explicitly designed to deal with long-term problems. The structure shares recurrent similarities with RNN but the core computation units contain special four interacting gates. These gates help to guide them on how to memorize and forget the inputs. The Figure 1 below illustrates the four gates mechanism in the LSTMs. First, the forget gate with the Sigmoid activation function decides which previous information in the memory cell should be forgotten. Then the next two gates decide what to memorize and what its memory magnitude or values utilizing both Sigmoid and the tanh activation function. Finally, the last gate combines LSTM input with cell memory to generate output for the current unit. Those outputs can be fed into the next LSTM layer along with memory cells to make the next prediction. The memory cell which passes through each LSTMs helps avoid vanishing gradient problem that could be observed in a vanilla RNN model.

Figure 2:Long short term memory diagram

Source: https://towardsdatascience.com/understanding-lstm-networks-by-example-using-torch-c63dba7bbb3c

Attention mechanism

This deep learning technique has shown excellent performance in the textual and visual area. Attention mechanism could help model focus on inputs easier using their softmax layer and attention weights. The examples are text translation and image caption generation. The work of [17] is shown in Figure 3 below. The attention mechanism could focus the input (black/white) for each prediction output. We aim to apply the attention mechanism in order to see what the deep learning models are focusing on


6

Figure 3: Attention mechanism

Source: Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. I

Dual-stage attention

Dual-stage attention [14] is the early adoption of attention mechanisms to financial market prediction tasks. The model predicts the future price of NASDAQ 100 index. The left Figure below shows prediction results without attention, with 1 layer of attention and 2 layers of attention. On the right, attention weight could visualize the model ability to differentiates actual inputs with random noises. This dual-stage attention model will be our core focus to develop attention base predictive model for Thailand market.

Figure 4: Dual-stage attention recurrent neural network

Source: Qin, Y., et al. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Chapter 3 Methodology

Deep learning task design:

As described in the proposal that the deep learning output is flexible to design. We could define output freely depending on our objectives. As our project progress, we designed our problem to be a practical multiple stock returns prediction of the following days, as shown in equation (1). 𝑦"# = (𝑝"'(# −𝑝"'+# )/𝑝"'+# (1)

Where 𝑦"# is the return ratio for the stock i at time step t and 𝑝"'+# is the close price of

the next day while Where 𝑦"# is the return ratio for the stock 𝑖at time step 𝑡and 𝑝"'+# is the

close price of the next day while 𝑝"'(# is the close price of the following day. This modification

from [13] provides more explicit expected profit for investors, since the 𝑝"# (today close price) was already in the past and could not be used as an actual trading (buying) price. Proposed framework:

Our proposed framework aims to improve the performance of multiple stock returns prediction using various time-series inputs. The framework, as shown in Figure 5, starts with data pre-processing and normalization. Next, we apply the fixed batch data transformation before moving on to the prediction model, and we will discuss the detail of this transformation later

Figure 5: Proposed framework


8

Data Pre-processing

1. Fundamentals and price data.

The fundamentals data for each stock are transformed with forward-filling if they are quarterly updated to be consistent with other daily frequency data. Also, there are seven fundamentals attributes that were represented in other forms, namely, in the percentage of changes from last quarter, the percentage of changes from last year, and cumulative value since the beginning of the year. Table 2 below describes details of all 52 attributes of fundamental and price data used in this study.

Table 2: List of fundamental and price information inputs

Attribute Name Description Count

A/P Turnover Seven attributes presented in 5 forms below

• Q - at the quarter data

• Cum. Q - cumulative quarter value since the first day of the year

• QoQ % - percent change from the previous quarter

• YoY % - percent change from the previous year, same quarter

• YoY Cum. - percent change from YE data (cumulative)

35

D/E Ratio

Fixed Asset

Shareholder Equity

Total Asset

Total Liability

Total Revenue

Attribute Name: Quarter data 6

Cash Cycle Period

Net Profit Margin Net Profit

Earnings per Share

Return of Asset Return of Equity

Attribute Name: Daily data 11

Close Price Open Price Stock Trade Volume

P/E Ratio

High Price Book Value Transaction Volume

P/BV Ratio

Low Price Market Value Market Capital

2. Technical indicators

We adopt a list of indicators proposed in [9] then generated them with our proposed period from short to long terms: (5, 7, 10, 14, 20, 30, 50, 75, 100) days. The total number of generated indicators is 9-periods multiply by 17 time-series from 15-indicators (MACD provide


9

three series) equals 153 features. Table 3 shows the full list of our technical indicator features. We applied this indicator generation for every single stock in our target stocks.

Table 3: List of technical indicators generated from historical data

RSI EMA TripleEMA MACD CMFI

William%R SMA CCI PPO DMI

WMA HMA CMO ROC PSI

Data Normalization

Standardization is applied to the input features because each of them has a different range of values. The z-score normalization formula is as follows.

z=(x-μ)/σ (2)

Where μ is the mean of the input x, and σ is the standard deviation of the input x.

The calculation of both σ and μ is within the validation and training dataset to avoid our model observation on the distribution of the testing dataset.

Data scope

The stock’s end of the day (EOD) data used in our research is from The Stock Exchange of Thailand (SET) Market, corresponding to the period from 12th February 2008 to 28th December 2018. We use daily frequency data due to computational and data access limitations. The total trading days during the studied period are 2655 days and are split into three sets, as summarized in Table 4. With 64 targets stocks in our scope, the total training, validating, and testing data for the model are approximately 92,000/31,500/15,000 records per period, respectively.

As suggested by [9,16], we split the data into training, validating, and testing period, as shown in Fig. 6. The split setting is to verify the robustness of the model over time.

Table 4: Data set records summary

No. Data period Training

records/stock Validating

records/ stock Testing

records/stock

1 Feb-2008 to Dec-2016 1437 509 222

2 Jan-2009 to Dec-2017 1464 487 243

3 Jan-2010 to Dec-2018 1464 488 244


10

Figure 6: I l lustration of the sl iding-split t ing periods

All data provided from SET SMART DATA protocol (Static data) (data granted with collaboration from Financial Laboratory, Chulalongkorn University)

Target stocks pre-selection

There are three considerations for our selection principles. 1. Stock information availability through training to testing periods

2. Sufficient liquidity to assume order always get filled

3. Sufficient volume and big market cap, to avoid price manipulation and to assume that

our trading effect on the price can be neglect

With the above criteria, we choose 64 target stocks out of the SET100 index listed below

Table 5: List of 64 target stocks within this research

BBL AP BH AMATA ERW TU QH CPALL

BJC HANA CENTEL RATCH KKP LPN BDMS AOT

SCC PTTEP INTUCH HMPRO KCE EGCO ADVANC WORK

TCAP PSL TVO PTT MINT CPN ROBINS TOP

KBANK SPALI TPIPL MAJOR UV IRPC BLAND BCH

SCB TRUE TASCO KTC LH DELTA GFPT SUPER

TMB BCP BTS THANI BANPU CK THAI GLOW

CPF ITD STA RS KTB SIRI STEC DTAC

The SET100 index includes top stock with high market capitalization and trading volumes. We select SET100 Thai market capital as of 7 February 2018. These 64 stocks started trading before 2008 and still active in 2018.

No. 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

1 Training Validating Test



Chapter 4 Proposed model

Our proposed model, Dual Attentional Ranking model (DA-RANK), aims to simultaneously predict sets of stock returns with relation inferences between them. The model structure consisted of two parts (i) Features relevance and temporal attention-al recurrent neural network (ii) Stock relation inference framework.

Features Relevance and Temporal Attention.

We select the state-of-the-art attention model for time series predictions called “Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN)” to enhance feature and temporal relevance. Our core deep learning network is a modification from the original work of Qin [12]. We added a batch normalization [15] layer before the Softmax layer in the input attention layer, as shown in Fig. 7 to enhance attention weights calculation. We omitted the full detail of DA-RNN is from our work as we introduce minor changes to the original work.

Figure 7: A simplif ied diagram of dual -stage-attention recurrent neural networks

Source: Qin, Y., et al. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Stock Relation Inference

The second part of our model structure purposes is to integrate stock relations during model training. We impose two methods (i) Fixed-batch training for a shared-parameter model and (ii) Pair-wise ranking loss.

1. Fixed-batch Training for a Shared-parameter Model. Developing a model per stock becomes tedious works for stock prediction task with a higher number of target assets because we need to tune them one by one for the optimized hyperparameters. Another method for multiple stocks prediction is to construct a single model


12

that treats all stocks collectively as combined features. Then the model can capture the relations between those features without having to calibrate the hyperparameters one at a time. Nonetheless, this single complex model involves an excessively higher number of trained parameters and in turn, becomes harder to converge. Alternatively, to capture stock relations while maintaining feasible parameters amount

We adopt the arrangement from [13] to train the shared parameter model and transforms the input data into a fixed-batch size equal to the number of target stocks. Fig. 8. shows the proposed fixed batch transformation. A slice of single stock’s features has a dimension of T x k, where k is the number of time-series features for each stock (e.g., technical indicators, financial parameter series), and T is the sliding window for those features. We prepare this slice for each stock within our target N stocks. This collection of N feature slices is size-equivalent to the training batch size and represents multiple stock information during the same period. All N stocks share the same weight in the modified DA-RNN model Fig. 8(b)as a result of our fixed batch size setting. These model’s shared weights are updated when the model observes all N slices of stock features in a batch during training.

There are three benefits to this design. First, it favors the ranking loss calculation, and

we will discuss this ranking loss in the next section. Second, the model becomes universal, with the ability to predict particular stock independently. To be more specific, the model treats individual stock as one separate set of features within a training batch. The trained model could still predict any stock without the need to retraining the whole model again when any stock ceases to trade in the market. Finally, it reduces the model weights per data by a factor of target stocks. The lower model weights imply faster training and more straightforward to converge for the solution.

2. Pair-wise Ranking Loss We use a combination of regression loss and ranking loss to optimize our model. On the regression part, the widely used mean square error loss (MSE) is selected for the model to

focus on the return prediction accuracy. This MSE loss calculation for stock 𝑖 is displayed in equation (3). Next, the pair-wise ranking loss is introduced to infer stock relations among all target stocks with their relative ranking score. The formula in equation (4) calculates the relative ranking error for every pair in the matrix. Finally, the combined loss for both functions in equation (5) is backpropagated to the model when learning a fixed batch size data Fig. 8(c).


13

𝑀𝑆𝐸𝑙𝑜𝑠𝑠# = 6𝑦7"# − 𝑦"#8( (1)

𝑃𝑎𝑖𝑟𝑤𝑖𝑠𝑒 − 𝑅𝑎𝑛𝑘𝑖𝑛𝑔𝑙𝑜𝑠𝑠 = ∑ ∑ max(0, −HIJK

H#JK (𝑦7"# − 𝑦7"

I)(𝑦"# − 𝑦"I)) (2)

𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑𝐿𝑜𝑠𝑠 = ∑ QRSTUVVWXWYZ

H+ 𝛼(𝑃𝑎𝑖𝑟𝑖𝑤𝑖𝑠𝑒 − 𝑅𝑎𝑛𝑘𝑖𝑛𝑔𝑙𝑜𝑠𝑠) (3)

Where N is the number of target stocks to be predicted simultaneously, the𝑦7"# is the

predicted return for stock𝑖 at time step t, the 𝑦"# is the label describes in the equation (1), and

𝛼 (alpha) is a weighting ratio tradeoff between the regression accuracy and the ranking accuracy, which is one of the hyperparameters to be tuned.

Figure 8: Diagram of the proposed model DA-RANK: (a) input features sl ices (b) a modif ied DA-RNN unit , the model’s weights are shared among al l stocks (c) combination of loss functionsA simplif ied diagram of dual -stage-attention recurrent neural network

Chapter 5 Performance evaluation

Evaluation metrics We compare the performance of each model with the following measures

1. Root mean square error (RMSE): Standard evaluation to the predicted return

2. Mean reciprocal ranking (MRR): Evaluate model on the ranking performance of the top stock (highest predicted return). The reason that MRR is calculated for top stock only is to align with our trading simulation, which we describe in the next paragraph.

3. Trading simulation:

The trading strategy we selected to evaluate model profitability is the daily buy-hold-sell strategy. We will invest only in the highest predicted return for each day, buy it on the close price of the next day, and sell it on the following. Also, every day, the amount of money for investment is the same (e.g., 1000 dollars daily). Detail for trading simulation as follows

• At the end of day t, we run the model to make return prediction for all target stock; the

output forecast implies profit for t+2 as described in equation (1).

• Before the end of day t+1, we pick the highest predicted return stock from day t

prediction. Set sell the buying condition to be at the close price of day t+1. (SET market

allows the investor to preset trading condition for the at-the-open price (ATO) or at-the-

close price (ATC) for each trading day)

• At the day t+2, we sell the stock bought from the previous day ATC price.

We neglect the Fee in the experiments. However, we can recalculate percent profit after fee

with Equation 4: Where t is the number of trading days

%𝑅𝑒𝑡𝑢𝑟𝑛_`"abàaccdb#ef"c_gV =h(×"×àa'(+hàa)×∑ %ja"dbeWWYk

WYl+'àa

(4)


15

Baselines A comparative study will include these forecasting and trading models 1. Traditional trading

a. Buy /hold

i. SET Index: buy & hold for the SET market index

ii. SET100 Index: buy & hold SET100 Index

iii. SET64 Index: buy & hold for our 64 target stocks equally in investment (e.g.,

10,000 dollar per stock)

2. Financial Baseline

a. GARCH model was implemented based on reference [18]

i. To predict the volatility of each stock and trade them from the signal

produced

3. Machine learning

a. Artificial neural network: Basic 2 layer dense with linear output

b. Random Forest: To represent machine learning

4. Deep learning

a. LSTM: a1 Layer Vanilla LSTM model

b. DA-RANK: This research

Hyper parameter tuning

We optimized our model with the Adaptive Moment Estimation (Adam) algorithm with an initial learning rate of 0.001. Next, a grid search for hyperparameter was applied to the range of parameters as follows: hidden unit (16, 32), window size -T (5), and regression-ranking tradeoff:

Alpha - 𝜶 (0, 10, 100, 1000). We choose this alpha range because we observed that the average magnitude of the MSE loss on the first model epoch is around 95 times larger than the ranking-aware loss. Moreover, the batch size was fixed as 64 (equal to target stocks) to achieve the relations inference.

Chapter 6 Experimental results

We select the model with the best trading simulation profit in the validation dataset to be evaluated in the test dataset. Table 4 shows the test results from the model with the best validation. The results show that on the RMSE metric for regression return predictions, the RF consistently top performs over three years, while our model ranks the second. On the mean reciprocal ranking score (MRR), our model ranks the best in the year 2016 and 2017; however, the RF model outperforms our model on 2018 data. The table below shows results comparison with baseline models

Table 6: Experimental result for seven comparative models

Profit % Model 2016 2017 2018 Avg.

SET 20.00% 12.20% -12.10% 6.70% SET100 20.20% 14.90% -11.40% 7.90% SET64 (target) 25.80% 19.10% -16.00% 9.63% MRR top stock RMSE

GARCH -10.61% 0.92% 2.35% -2.44% 2016 2017 2018 Avg. 2016 2017 2018 Avg.

ANN 21.46% -14.35% 44.58% 17.23% 0.083 0.07 0.124 0.092 0.0346 0.0992 0.0221 0.05

RF 62.20% 49.20% 85.80% 65.73% 0.102 0.103 0.148 0.118 0.02 0.017 0.019 0.02

LSTM 53.20% -50.19% 38.39% 13.80% 0.088 0.068 0.097 0.084 0.629 0.464 0.0692 0.39

DA-RANK 112.28% 60.72% 41.01% 71.34% 0.137 0.105 0.132 0.125 0.043 0.023 0.022 0.03

Performance charts

Figure 9: Performance chart for 2016 data


17



Model Interpretability

One of the main reasons we are aiming to use the dual-stage attention mechanism model as a base deep learning model is because the ability to extract the relevance weights for each feature and temporal information through the softmax layer of the model. At the attention input layer, as highlighted in the red box in Figure 9 (a), all the input from time-series features is pass through and is multiply with the attention weight from softmax function. We are able to investigate this weight at every time step to observe what features are the most relevance for producing the prediction. Next, at the temporal attention layer, as highlighted in the red box in Figure 9 (b), the feature representations from the attention layer are feed into the temporal


18

layer one at a time for each time step. Likewise, the temporal attention weights are also applied to this encoded information before passing through for the LSTM prediction layer. With these two input attention and temporal attention, we are able to visualize some model interpretability to gain insight when its producing predictions.

Figure 12: Attention mechanism in dual-stage attention

Hosted in our proof of concept website is the results from our research, including the attention weights from our deep learning model.

Chapter 7 Deep-trade website URL: https://deeptrade.cu-set.com/ User : investor Password: cu-set Website features 1. Portfolio summary: an interactive module to illustrates the performance of each model

2. Daily prediction: a daily return prediction from our Dual-Rank model for all 64 target stocks.

We can also see the attention weight to find insight per each stock from relevance bar

charts

3. Stock Insight: Candlestick charts to provide a higher level for each stock as well as to

explore deep learning insight for any time step on the price charts.

Portfolio summary module

This module provides an interactive chart to study the performance of each model as well as a customize portfolio tester for starting investment amount and trading fee

Figure 13: Website: Portfol io summary


20

Daily prediction module

This daily prediction module is shown below 64 boxes of return prediction ranked by their expected returns. When clicking at the box, the temporal and features relevance are shown on top for user consideration. For example, PTTEP is predicted to provide the highest return and with bar chart ranked WILLR indicator 75 days period contributing the most to today's prediction.

Figure 14: Website: Daily prediction

Stock InSight

This module provides a familiar plot of the candlestick for the selected stock on the top right We converted the return prediction to a price prediction then plot the overlay on actual candlestick plots.


21

Figure 15: Website: StockInsight

When clicking at the price of a candlestick bar, the information below will provide attention weights for that selected day. For example, below, then click on Apr 16, 2018, the model shows that close price features contribute the most at this time step.

Figure 16: Website: Attention explainable

Chapter 8 Conclusion and discussion

On the topic of model performance We have found that our tuned DA-Rank model provides the highest average returns comparing to other methods with better ability to handle rich features inputs as well as stock ranking ability. Next, we also found that the RF model also consistently provides satisfying results we less effort and fewer resources to train. Interestingly the ANN and LSTM which are both a simpler version of the Neural Network model, made weak predictions, especially in the year 2017. We suspect that the model might be too over-fitted with the train data since the hyperparameter is fixed the same for all ANN, LSTM and DA-Rank model. Next for the GARCH model, with limited time, we follow the reference [18] to produce variance prediction and turn them into a trading signal for all 64 stocks. Unfortunately, it performs the poorest even comparing to buy and hold of the indexes. We suggest to re-implement the GARCH model for the ranking purpose to fairly compare with other models that trade only 1 stock per day. On the topic of model robustness We found that the more complex the model is the less robust the model becomes. For example, in the DA-RANK model, we found a high standard deviation of our result up to +- 48% annual profit. This might be a room for future works to develop a more robust framework On the topic of model interpretability Only two of the model in this research can provide model interpretability, RF model and our DA-RANK model. This model interpretability is also an important aspect to gain trust in the user of such a model. The RF model can provide insight from one of its decision trees constructed during training. However, it might be hard to understand the tree if the parameter of the RF is too complicated. On the other hand, over model provide a more intuitive comparison between features and temporal on bar charts.


23

Acknowledgement

We would like to express our appreciation to all the parties who support us during this research. Big thank you to all involved

• The Stock Exchange of Thailand (SET)

• Capital Market Research Institute (CMRI), all officers, and committees.

• Financial Laboratory, Department of Banking and Finance, Chulalongkorn Business School

o Asst. Prof. Tanakorn Likitapiwat, Ph.D.

• The Datamind Laboratory, Department of Computer Engineering, Chula

o Asst. Prof. Peerapon Vateekul, Ph.D.

o Tanawat Chiewhawan

o Sanathapon Sripilaopong

• GIPSIC Corporation Ltd. for website development

o Wisit Wongchaianukul

o Nutnicha Juntasri

Appendix Recommended future works

• Explore market microstructure and intraday possibility

• Integrated the model and framework with live data with future-forward test

• Study model robustness and avoid deep learning overfit

• Explore dimensionality reduction methods to improve model

• Textual features possibility

• Model explainability evaluation


24

Tools and resources

• GPU servers (Nvidia RTX 2080) support by CMRI budget

• Pytorch deep learning frameworks

• Python-based development

• GIPSIC Corporation Ltd. Website development

References 1. Fama, E.F.: Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance 25(2), 383-417 (1970). doi:10.2307/2325486 2. Malkiel, B.G.: Reflections on the efficient market hypothesis: 30 years later. Financial Review 40(1), 1-9 (2005). 3. Ballings, M., Van den Poel, D., Hespeels, N., Gryp, R.: Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications 42(20), 7046-7056 (2015). 4. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications 42(1), 259-268 (2015). 5. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research 270(2), 654-669 (2018). doi:10.1016/j.ejor.2017.11.054 6. Chen, K., Zhou, Y., Dai, F.: A LSTM-based method for stock returns prediction: A case study of China stock market. In: 2015 IEEE International Conference on Big Data (Big Data) 2015, pp. 2823-2824. IEEE 7. Nelson, D.M., Pereira, A.C., de Oliveira, R.A.: Stock market's price movement prediction with LSTM neural networks. In: 2017 International Joint Conference on Neural Networks (IJCNN) 2017, pp. 1419-1426. IEEE 8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput 9(8), 1735-1780 (1997). doi:DOI 10.1162/neco.1997.9.8.1735 9. Sezer, O.B., Ozbayoglu, A.M.: Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Applied Soft Computing 70, 525-538 (2018). doi:https://doi.org/10.1016/j.asoc.2018.04.024 10. Hollis, T., Viscardi, A., Yi, S.E.: A Comparison of LSTMs and Attention Mechanisms for Forecasting Financial Time Series. CoRR abs/1812.07699 (2018). 11. Guo, T., Lin, T.: Multi-variable LSTM neural network for autoregressive exogenous model. arXiv preprint arXiv:1806.06384 (2018). 12. Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. arXiv e-prints (2017). 13. Feng, F., He, X., Wang, X., Luo, C., Liu, Y., Chua, T.-S.: Temporal Relational Ranking for Stock Prediction. ACM Transactions on Information Systems 37, 1-30 (2019). doi:10.1145/3309547 14. Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 26-29 June 2016 2016, pp. 1-6


25

15. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015). 16. Oncharoen, P., Vateekul, P.: Deep Learning Using Risk-Reward Function for Stock Market Prediction. Paper presented at the 2018 2nd International Conference on Computer Science and Artificial Intelligence, Shenzhen,China. 17. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning 2015, pp. 2048-2057 18. GARCH model implementation: Access date 20 Dec 2019 https://medium.com/auquan/time-series-analysis-for-finance-arch-garch-models-822f87f1d755

Scholarship Project Paper 2019 Attention-based Deep ... · Chapter 1 Introduction 1 Research...

Documents

Transcript of Scholarship Project Paper 2019 Attention-based Deep ... · Chapter 1 Introduction 1 Research...