MSE 448 Final Presentation
Transcript of MSE 448 Final Presentation
![Page 1: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/1.jpg)
MSE 448 Final Presentation
Aman Sawhney, Yang Fan, Chris Lazarus
June 2, 2021
![Page 2: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/2.jpg)
Outline
Overview
Data
Strategy Overview
Technical Features and Feature Engineering
Sanity Check with Zigzag patterns
Simulation Method
Learned Model Trading Results
GAN Results
Next Steps
2/29
![Page 3: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/3.jpg)
Overview
▶ (Refresher) HFT as an MDP▶ Data▶ Strategy Overview▶ Simulation Assumptions▶ Technical Features and Feature Engineering▶ Learned Model Trading Results▶ GAN Results▶ Next Steps
3/29
![Page 4: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/4.jpg)
High Frequency Trading as an MDP
Since HFT strategies rely on taking and providing liquidity when itis appropriate, we make the modeling assumption that order bookdynamics are a markov process. Hence we may formulate a HFTstrategy as a Markov Descion Process in the following manner:▶ We will assume discrete time intervals which will be
determined by our time-scale, T .▶ For each time step 0 ≤ t ≤ T we have
▶ st := (Ot, qt) where Ot is the order book history at time t overa look back period steps and qt is the amount of the asset theagent currently holds.
▶ at ∈ {T, P,N} where T is the act of taking liquidity, P is theact of providing liquidity, and N is the act of doing nothing.
▶ rt+1 is an appropriate reward function.This framework generalizes to a multidimensional asset space.
4/29
![Page 5: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/5.jpg)
Reinforcement Learning
There are two main approaches to solving RL problems:value-based methods (ie. Q-learning) and policy search methods(ie. policy gradient).▶ Deep Q-Learning (DQN) - minimizes MSBE, off-policy,
sample efficient, generally good for discrete and lowdimensional action and state spaces
▶ Proximal Policy Optimization (PPO) - maximizes expectedreturn, on-policy, sample inefficient, generally good forcontinuous action and state spaces
We tried DQN and PPO:▶ DQN - showed good performance▶ PPO - abandoned due to low performance and high
computational cost
5/29
![Page 6: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/6.jpg)
Data
We pulled top of the book data from MayStreet aggerated bysecond, from 9:30AM to 11:30AM for the first 5 months of 2021for the 5 S&P 500 stocks with the highest beta.This amounted to a massive data set with well over 100 millionrows.
6/29
![Page 7: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/7.jpg)
Strategy OverviewWe assume▶ The starting account balance of our agent is the cash value of
6000 shares at the opening price of a given security▶ The agent is able to trade with two times leverage▶ All entered positions must be exited after a two minute
holding time▶ At any given second the agent is able to buy the minimum of
100 shares of the ask size and sell the minimum of 100 sharesand the bid size.
▶ The agent must always buy at the ask and sell at the bid price.▶ The agent must maintain a net worth greater than 0. I.E. the
value of its positions plus the cash held as balance must begreater than 0. Otherwise, trading must end.
▶ The agent can trade from 9:32 am to 10:02 am.
7/29
![Page 8: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/8.jpg)
Strategy Overview (count.)
Given these assumptions, our agent must optimize buy, sell, andhold actions to maximize the following reward function:
rt = 1t<Tα ∗ returnt + 1t=Tβ ∗Rt
Where t is current second, T is the final time step and at is theaction at time t, returnt is the two minute return of at−120, RT isthe overall return, and α and β are hyperparmeters .
8/29
![Page 9: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/9.jpg)
Technical Features and Feature Engineering▶ The order book data: bid/ask price, size, number of providers;
adjusted volume▶ Technical indicators: SMA, EMA, RSI, ROC, TRIX, PPO,
PVO, AROON, DPO, MACD, SRI▶ Re-normalize some of the indicators against the first value
encountered in the beginning of each episode, to increaseperformance when feeded into NN
▶ Maxmial Fourier modes are mostly 0 over short horizons, andrequires huge prepossessing time.
▶ Custom NN structures as feature extractors, with every obs asa F by L matrix, where F is the number of features and L isthe amount of history we allow the agent to look back.▶ Large MLP networks▶ LSTM + MLP (1D LSTM running over the L dimension)▶ Transformer + MLP (with the same obs matrix feeding into
encoder and decoder)9/29
![Page 10: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/10.jpg)
Sanity Check with Zigzag patterns
▶ We create a counterfactual order book with oscillating linearpatterns ranging from 10 to 20, with 0 gap between the bidand ask price.
▶ Implemented technical indicators and normalization asspecified before.
▶ Adapted holding periods (5s) for to match the period ofoscillation (20s).
▶ Comparison between strategies with/without forcedliquidations.
▶ Trained different models with similar amount of computingcost.
10/29
![Page 11: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/11.jpg)
Sanity Check with Zigzag patterns
MLP without forced liquidationDefine rewards for every step
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
11/29
![Page 12: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/12.jpg)
Sanity Check with Zigzag patterns
MLP with forced liquidationDefine rewards for every step
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
12/29
![Page 13: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/13.jpg)
Sanity Check with Zigzag patterns
LSTM + MLP with forced liquidationDefine rewards for every step
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
13/29
![Page 14: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/14.jpg)
Sanity Check with Zigzag patterns
Transformer + MLP with forced liquidationDefine rewards for every step
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
14/29
![Page 15: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/15.jpg)
Sanity Check with Zigzag patterns
MLP without forced liquidationDefine rewards for only the terminal step as the total return
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
15/29
![Page 16: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/16.jpg)
Sanity Check with Zigzag patterns
MLP with forced liquidationDefine rewards for only the terminal step as the total return
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
16/29
![Page 17: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/17.jpg)
Sanity Check with Zigzag patterns
LSTM + MLP with forced liquidationDefine rewards for only the terminal step as the total return
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
17/29
![Page 18: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/18.jpg)
Sanity Check with Zigzag patterns
Transformer + MLP with forced liquidationDefine rewards for only the terminal step as the total return
Figure: Portfolio Value Figure: ActionsActions = 0: short, 1: buy, 2: hold
18/29
![Page 19: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/19.jpg)
Sanity Check with Zigzag patterns
▶ MLP might be sufficient as compared to more complicatednetworks, with the added benefit of being more consistent. Wecan always add non-linearity in the feature engineering step.
▶ Forced liquidation after a short period of time helps the agent.▶ There is room to try different reward functions, might be the
key to the agent’s performance.
19/29
![Page 20: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/20.jpg)
Simulation MethodTo train our RL agent we built an environment using Open AI’sgym interface. Each episode of training loads a random day andrandom symbol’s data from a train or test time dataframe.Compared to other RL projects, such as FinRl and other RLenvironments, our environment has the following advantage:▶ Homogeneous Trading Horizons: By sampling from the same
time of day, as compared to randomly sampling along a stockpath, our episodes contain very similar market micro structure
▶ Data Density: Most other academic projects use daily data.Since we use data aggregated on the second level, the data isfar more dense.
▶ Realism: In the real world, high frequency trading modelsshould be adapt at providing sustainable profit across a varietyof symbols. Hence the variety of symbols allows train a morerealistic model.
20/29
![Page 21: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/21.jpg)
Learned Model Trading Results
Figure: Model Results Using Multiple Symbols
21/29
![Page 22: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/22.jpg)
Learned Model Trading Results (cont.)
Figure: Model Results Using Only AMD
If we take the previous return sequences, we obtain a sharpe ratiofor the 1 second time period of 0.00229. Annualized, assuming you
can only trade half an hour a day, that is a sharpe ratio of 1.54.22/29
![Page 23: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/23.jpg)
Learned Model Trading Results (cont.)
Figure: Model Results Using Only AMD Using A Larger Model
If we take the previous return sequences, we obtain a sharpe ratiofor the 1 second time period of 0.0131. Annualized, assuming youcan only trade half an hour a day, that is a sharpe ratio of 8.82.
23/29
![Page 24: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/24.jpg)
Data Augmentation using GANs [WKKK20]
▶ Autoencoder: embedding and recovery networks - stacked RNN anda feedforward network
▶ Adversarial Network sequence generator and sequence discriminatorcomponents - RNN as generator and a bidirectional RNN with afeedforward output layer for the discriminator
Source: [Jan20]24/29
![Page 25: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/25.jpg)
GAN Results - 1
25/29
![Page 26: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/26.jpg)
GAN Results - 2
26/29
![Page 27: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/27.jpg)
GAN Results - 3
27/29
![Page 28: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/28.jpg)
Next Steps
▶ Use the synthetic data to train the agent▶ Analyze and compare the effect of reward function choice▶ Finalize hyperparameter tuning▶ Train more specialist traders.
28/29
![Page 29: MSE 448 Final Presentation](https://reader035.fdocuments.us/reader035/viewer/2022070610/62c321e145c7706f45480372/html5/thumbnails/29.jpg)
References I
Stefan Jansen,https://github.com/stefan-jansen/synthetic-data-for-finance,Deep RL Workshop, NeurIPS 2020 (2020).
Magnus Wiese, Robert Knobloch, Ralf Korn, and PeterKretschmer, Quant gans: deep generation of financial timeseries, Quantitative Finance 20 (2020), no. 9, 1419�1440.
29/29