Analytics Lab Accenture Team Report

13
Analytics Lab Accenture Team Report Abraham Eaton, Gabrielle Rappaport, Alexandru Socolov, and Nova Zhang [email protected] [email protected] [email protected] [email protected] 8 December 2019 Executive Summary The goal of this project was to predict merger and acquisition (M&A) events using corporate jet flight data. After data processing and exploration, we applied predictive classification models, like Random Forest and Optimal Trees, to identify instances of M&A. Although results varied by industry, using advanced machine learning techniques allowed us to reliably identify mergers and acquisitions before their public announcement for companies in the Information Systems industry. We found predictive factors that coincide with the intuition of the investing professionals at Eaton Vance Management. The significance of the proportion of flights to a destination over six months indicates the more immediate focus on that location while the year-long proportion of flights to that location provides an indication of its long-term importance. Taken together, the difference between the two variables highlights changes in the company’s focus that might indicate a merger or acquisition in that location. Our best performing method heavily uses measurements of the short-term flight activity into a location, indicating whether the flight traffic has set a new record for company traffic into that region. A company flying into an area more than they ever have in the past clearly points towards abnormal activity occurring. While trade conferences or other corporate events might draw a yearly burst of flights that could skew other variables which only consider six months or a year in the past, this type of measurement accounts for all of the company’s past behavior. Thus, identifying these surge moments allowed our model to distinguish normal spikes in flight activity from significant new bursts. Predicting that a company will be part of an M&A transaction in a specific city is valuable information for hedge funds. Since the number of false positives obtained in our analysis is rea- sonable and the predictive performance is strong, experts can manually inspect the output of our predictions. They can narrow down from the city where the transaction will happen to the exact target company by looking at the recent performance of companies in the same sector in that geographical area, since under-performing businesses tend to sell. Additionally, public comments, strategic reviews, and publicly disclosed earning reports are also a great source of complementary information. Lastly, the monetary value of a strong predictive model is immense. Merger announcements typically involve a large premium over current prices (between 40% and 50% on average) and lead to a large and rapid change in market prices. Therefore, an improvement in the ability to predict which firms will be involved in a merger deal could turn out to be very profitable for an investor. Using the model developed in this report has great potential in identifying the next M&A with considerable confidence. 1

Transcript of Analytics Lab Accenture Team Report

Page 1: Analytics Lab Accenture Team Report

Analytics Lab Accenture Team Report

Abraham Eaton, Gabrielle Rappaport, Alexandru Socolov, and Nova Zhang

[email protected] [email protected] [email protected] [email protected]

8 December 2019

Executive Summary

The goal of this project was to predict merger and acquisition (M&A) events using corporatejet flight data. After data processing and exploration, we applied predictive classification models,like Random Forest and Optimal Trees, to identify instances of M&A. Although results varied byindustry, using advanced machine learning techniques allowed us to reliably identify mergers andacquisitions before their public announcement for companies in the Information Systems industry.

We found predictive factors that coincide with the intuition of the investing professionals atEaton Vance Management. The significance of the proportion of flights to a destination over sixmonths indicates the more immediate focus on that location while the year-long proportion offlights to that location provides an indication of its long-term importance. Taken together, thedifference between the two variables highlights changes in the company’s focus that might indicatea merger or acquisition in that location.

Our best performing method heavily uses measurements of the short-term flight activity intoa location, indicating whether the flight traffic has set a new record for company traffic into thatregion. A company flying into an area more than they ever have in the past clearly points towardsabnormal activity occurring. While trade conferences or other corporate events might draw a yearlyburst of flights that could skew other variables which only consider six months or a year in the past,this type of measurement accounts for all of the company’s past behavior. Thus, identifying thesesurge moments allowed our model to distinguish normal spikes in flight activity from significantnew bursts.

Predicting that a company will be part of an M&A transaction in a specific city is valuableinformation for hedge funds. Since the number of false positives obtained in our analysis is rea-sonable and the predictive performance is strong, experts can manually inspect the output of ourpredictions. They can narrow down from the city where the transaction will happen to the exacttarget company by looking at the recent performance of companies in the same sector in thatgeographical area, since under-performing businesses tend to sell. Additionally, public comments,strategic reviews, and publicly disclosed earning reports are also a great source of complementaryinformation.

Lastly, the monetary value of a strong predictive model is immense. Merger announcementstypically involve a large premium over current prices (between 40% and 50% on average) and leadto a large and rapid change in market prices. Therefore, an improvement in the ability to predictwhich firms will be involved in a merger deal could turn out to be very profitable for an investor.Using the model developed in this report has great potential in identifying the next M&A withconsiderable confidence.

1

Page 2: Analytics Lab Accenture Team Report

1 Project Summary

Our team explored the use of corporate jet flight data in order to gain tradeable insights onwhether companies are planning to merge with or acquire another company. After data processingand exploration, we applied predictive classification models, like Random Forest and Optimal Trees,to identify instances of mergers and acquisitions. We found that although results vary by industry,using advanced machine learning techniques allowed us to reliably identify mergers and acquisitionsbefore their public announcement for companies in the example of Information Systems industry.

1.1 Background

Both fundamental and quantitative investors are under pressure from market index funds, withthe Wall Street Journal proclaiming, ”Index Funds Are the New Kings of Wall Street.”1 Stockpickers are finding additional ways to add value that will outperform passively managed funds.One promising area of work is using alternative sources of data, including jet movement, to gainan edge at predicting important announcements like mergers and acquisitions (M&A) before theyoccur.

The profits to be made by inferring business decisions in advance of their official announcementsare huge. Companies and individuals are reticent when it comes to sharing details of their invest-ments and returns, but the value of information is highlighted by the growth of the industry. Somecompanies have already seen success using alternative data to predict company announcements,and interest is only rising. In fact, according to Eagle Alpha Ltd, the market for alternative datasources will reach 900 million dollars by 2021.2

1.2 Data

Our dataset includes information about the movement of corporate jets worldwide between 2010and 2017 inclusive. For each flight, we have a record of what company owned or leased the jet,geographic location of the company headquarters, takeoff and landing locations, length of stay atthe location, and date and time that the flight was made. Over the seven year time span between2010 and 2017, corporations included in the data flew over 3.5 million times to locations all overthe globe. Shown in Figure 1 are the first 100,000 flights, meaning this map represents a mere 3%of the total number of recorded flights between 2010 and 2017.

However, over the same time period our team only had records for 350 mergers or acquisitions,making an average of 10,000 flights per instance of merger and acquisition activity. Thus, ourchallenge was to cut through the massive amounts of noise in the data and distill a strong signalindicating when companies are considering merging and acquiring. The company responsible forcollecting all of the flight data provided several examples where significant company actions werepreceded by unusual flight activity.3 However, using the data alone to predict which flights areimportant instead of identifying them retrospectively with human insight poses a great challenge.

1WSJ Article — Lim, Dawn. “Index Funds Are the New Kings of Wall Street.” The Wall Street Journal, DowJones Company, 18 Sept. 2019

2Bloomberg Article — Kearns, Jeff. “How Big Investors Cash In on ‘Alternative Data.’” Bloomberg.com,Bloomberg, 9 Nov. 2019

3JetTrack Case Studies

2

Page 3: Analytics Lab Accenture Team Report

Figure 1: 100,000 flights between 2010 and 2017

1.3 Data Exploration

As a first step toward developing models, we attempted to identify factors correlated with futuremergers and acquisitions (M&A) in order to develop metrics for a predictive model. This was doneby plotting the number of flights that a company made to every airport in the year before a mergeror acquisition. Although we manually examined the flight data for each company for the yearbefore their merger or acquisition, we were unable to observe any significant trends. Primarilyour difficulties arose from the huge amounts of noise in the data. While the first flight to a newlocation might indicate a merger or acquisition, many companies fly little enough that practicallyany flight not to New York or California is the first to that location. Meanwhile, companies thatflew more also visited locations seemingly at random. Our challenge was thus to engineer featuresthat mathematical models could use to predict M&A activity in advance better than a humanlooking in hindsight.

2 The final model

After exploring the data, we realized that it would be almost impossible to predict a mergeror acquisition between two specific companies without additional data. Additionally, we foundthat it is extremely rare for both companies to make flights prior to an M&A. For each merger oracquisition, we typically had data on only one side of the transaction: the company doing the flying.Thus, we decided to predict, based on individual flights, whether the corresponding company willbe part of an M&A in the next few months in the corresponding location. More precisely, for eachflight in the dataset, we predict whether the flight is linked to an M&A that will happen within afixed number of months involving a company whose headquarters is located near the destinationairport.

3

Page 4: Analytics Lab Accenture Team Report

2.1 Feature Engineering

The goal of the feature engineering was to generate a dataset that could be fed to differentclassification models.

2.1.1 Generating the dependant variable

The first step towards creating a dataset to use with classification algorithms was creating abinary variable indicating whether a flight was related to a transaction that happened within thefollowing x months. We first converted the addresses of the headquarters of every company in ourdataset into Latitude and Longitude coordinates using an open-source API called geopy.geocoders4.Once these coordinates were generated, we identified the closest airports to each headquarters. Thiswas done by selecting up to three of the closest airports, within a range of 50 kilometers. To computethe distance between airports and headquarters, we used the formula for distance on a sphere.4.4

Finally, we combined our knowledge of each flight’s destination, the company headquarter locatednear each airport, and information about companies involved in each M&A to generate a variableindicating whether that flight could be linked to M&A activity in that area in the following xmonths with a local company.

2.1.2 Generating the features

The second step was to generate features based on flight activity to gather insight on the flyingbehavior of the companies. This step is challenging due to the fact that our goal is to have asmuch information as possible to base our predictions off of while still maintaining a small numberof features, but the size of the data, approximately 3.5 million instances, made generating thedifferent features within a reasonable time difficult.

We engineered the following features:

• The number of flights of the company to the same arrival airport in the past 1, 3, 6, and12 months. This features gives the frequency of travel to specific destinations.

• The proportion of flights of the company to the same arrival airport in the past 1, 3, 6,and 12 months. Compared to the previous metric, this ones gives insights on the tendencywith which a company flies to a specific destination, as the particular frequency is comparedto general behavior.

• Three different booleans specifying whether the company flew for the first time to thedestination in the past 1, 3 and 6 months. It serves as a flag for new flight behavior, whichmay be a marker for new activities within the company and potentially M&A activities.

• Number of flights to the same arrival within the past 1, 3 and 6 months, from anycompany in the same industry as the one flying. This gives an indication of concentration ofthe industry at a given location.

• Detection of potential unusual clusters of a company flights in the arrival destination. Inaddition to the proportion of flights in the past few month, this metric shows abnormalbehavior in a short period of time (2 weeks). The following equations show how this ismodeled:

peak size =number of flights in the current time period

average number of flights in all individual time periods

4Geopy API

4

Page 5: Analytics Lab Accenture Team Report

average of flights during the period =total number of flights from the company to the location

total number of periods

• Boolean variable indicating whether the flight comes from the headquarters of the flyingcompany or not.

2.2 Methods

After augmenting the provided dataset with the additional features and response variable asoutlined above, we built several classification models: some interpretable models to explain theintuition behind the algorithm’s choice, i.e. logistic regression and optimal classification trees, andsome ensemble methods for potentially higher predictive power like random forest and gradientboosted machines.

The dependent and independent variables were the same across all models. The responsevariable is a binary indicating whether the travelling company will be part of a merger or acquisitionwith a company located close to the arrival airport within the next x months. We initially pickedx = 3 and then changed it to probe into the sensitivity of our results. The independent variableswere all the features we engineered, together with the duration of stay and month information foreach flight. The full list of independent variables is available in the Appendix.

2.2.1 Logistic regression

We fit a logistic regression and obtained the probability for each flight to be linked to anM&A. We tested out a range of thresholds above which a flight would be classified as positive, i.e.associated with an M&A. Discussion with our project sponsors guided us to prioritize reducing thenumber of false negatives while paying the price of higher false positives. In other words, we wantthe model to identify any unusual behaviour rather than miss an M&A connected flight, since ourmodel’s predictions will be fed to an industry expert to additionally examine the flight activityflagged by the model. We pick the threshold to be 8.5% for the logistic regression.

2.2.2 Optimal Classification Tree (OCT)

Aimed at explainable results, we constructed an optimal classification tree which is a newmethod significantly improving traditional CART predictive power using a global optimizationapproach to building a classification tree. We used the InterpretableAI package in Julia to do so.We set up a grid search to pick the best minimum bucket size and maximum number of splits in atree. Once built, we used the resulting decision tree to classify future flights.

2.2.3 Random Forest (RF)

Diving into the black box methods in search of improved predictive performance, we built arandom forest model which is an ensemble of classification trees, each built using a subset of datapoints and features. While tuning the parameters, we explored models with 500 trees, minimum of25 data points in each leaf and a range of 1 through 24 maximum splits and picked the best modelusing 10-fold cross-validation. Once built, we looked at a table of how many times each certainfeature appeared across all trees in order to rank variable importance.

5

Page 6: Analytics Lab Accenture Team Report

2.2.4 Gradient Boosted Machines (GBM)

Another ensemble method, GBM are classification trees built on top of one another trying topredict the error from the previous tree. Because of sky-rocketing computational complexity andthe lack of cloud computing infrastructure available, we manually searched for the best parameters.Experimentally, we found 500 trees, 25 minimum observations in each leaf, 10 maximum splits anda learning rate of 0.001 worked well.

3 Results

3.1 Perfomance

We evaluated all models across the following four metrics:

• Accuracy - share of flights correctly identified as associated with an M&A between the flyingcompany and some company located close to the arrival airport within the next 3 months.

• Area Under the Curve (AUC) - a measure of how well the model is able to distinguish betweenthe flights related and unrelated to M&A.

• True Positive Rate (TPR) =Number of true positives

Total number of actual positives

• Precision =Number of true positives

Total number of positives

The following table is a high-level summary the results obtained. The models were developedusing the flight data for the Information Technology industry in 2013. The performance metricswere computed on the testing data which was set aside before training and tuning the models.

Model Accuracy AUC TPR Precision

Logistic Regression 0.6035 0.7259 0.8364 0.0996Optimal Classification Tree 0.9468 0.6214 0.2545 0.4667

Random Forest 0.9496 0.8455 0.0182 1.0000Gradient Boosted Machines 0.9431 0.8849 0.0909 0.3125

We see that highest accuracy and precision rates are observed using a random forest model.Since the latter is in fact 1.000, the model only produces true positives, meaning a postive predictionby this model can be trusted as a strong signal of some unusual activity. This of course comes at acost of the lowest true positive rates - the model leaves many M&A related flights undetected. Inother words, random forest is rather conservative in predicting 1, but when it does, it is a strongindication that the case is worth further investigation.

The highest area under the curve and true positive rates has been attained by the gradientboosted machines (GBM) model. The former metric means that given a certain flight, the modelcorrectly marks it as 1 or 0 in 88.49% of the cases. The GBM model has correctly identified 9.09%of all M&A related flights, which came at a price of a decreased precision rate. The OptimalClassification Tree achieves a high degree of accuracy for the testing data along with admirableinterpret-ability, despite having the lowest AUC scores. The benefit of this method is that it allowsan intuitive understanding of what factors are important for indicating if an M&A will take place.3

This helps investors generalize the model even beyond the immediate data-set it was trained on.This performance can be quantified by observing that the Optimal Classification Tree built on

6

Page 7: Analytics Lab Accenture Team Report

data from the Information Systems industry sector still performed reasonably well across sectorsin the Financial industry. While some industries are different enough to require new decision trees,observing consistent results across multiple slices of data confirms the value of this method.

The simplest model, logistic regression, does surprisingly well. Despite its low accuracy, themodel was able to pick out 83.64% of the M&A related flights at the cost of a lower precision rate of31.25%. Thus, logistic regression is an optimistic model that is best at catching most of the M&Arelated flights and can be used if the cost of investigating a false positive is lower than the foregoneexpected profit from identifying an M&A correctly, i.e. predicting a false negative. Moreover, itsdistinctive advantage is its relatively cheap computational requirements. Utilizing this model inreal time seems feasible and could help in fast-paced environments. For example, combining theinsights of this model trained on all recent information with a release of a new rumour could helpthe fundamental analyst make better informed investment decisions.

3.2 Confusion matrices

We can look at the predictions to see how well different methods classify the flights into M&Arelated (positive, predicted as 1) and unrelated (negative, predicted as 0). In addition, let us takea deeper look at precision of each model due to the fact that we would rely on the positive resultsas support to invest.

Predicted by Logistic OCT RF GBM

0 1 0 1 0 1 0 1

Actually unrelated 601 416 1001 16 1017 0 1006 11Actually M&A related 9 46 41 14 54 1 50 5

Predicted by Logistic OCT RF GBM

Precision 0.0995 0.4667 1.0000 0.3125

As noted before, logistic regression is the most optimistic in its predictions, and its low precisionrate reflects this. However, the model only misses few flights that are actually related, identifying46 out of all 55 related flights.

Within the context of the problem, the best confusion matrix belongs to the Optimal Classi-fication Tree, which had the lowest AUC score. It identified 14 flights as M&A related while notdecreasing its precision as drastically as logistic regression.

Other models are more cautious in predicting a positive outcome. Gradient Boosted Machines,having the best AUC metric, correctly identified 5 flights, yet wrongly flagged 11 as M&A related,resulting in the second lowest precision of 31.25%. On the other hand, Random Forest has thehighest precision of 100%, but this is due to that fact that Random Forest flags only one observationas positive.

Taking the probabilities of each flight being classified as positive rather than the binary 1/0predictions, we can adjust the threshold above which an observation is predicted as positive. Thiswould make the Random Forest predictions score lower on specificity, inevitably increasing thenumber of false positives. Yet this should increase the coverage of flights correctly predicted, truepositives. Adjusting this parameter allows the algorithm to adapt to the appetite of specificity ofits user.

In general, since the positive predictions of the model are likely to be manually inspected by anexpert investor, increasing the precision is the preferred trade-off for other metrics like AUC andtrue positive rates. Thus, in the context of our business problem, OCT seems to be the preferablemodel based solely on the confusion matrix results.

7

Page 8: Analytics Lab Accenture Team Report

Figure 2: Out-of-sample Receiver Operator Curves for Information Technology sector in Q4 2013

3.3 Receiver Operator Curve

We also plot the Receiver Operator Curve, which gives more insight into the ability of themodels to distinguish between the flights related and unrelated to M&A (2).

3.4 Performance across industries and time

We apply a similar methodology to the flight data from the Healthcare sector from 2017. Asbefore, we train on the first 9 months of flights and test on the last 3 months. We also apply it onthe Financial sector, training on all years available, 2013 through the end of 2018, evaluating onthe first 9 months of 2019. Here are the performance of the models obtained.

Healthcare 2017 Financial 2013 - 2019

Model Accuracy AUC TPR Precision Accuracy AUC TPR Precision

Logistic 0.8178 0.6193 0.3028 0.0438 0.8883 0.8387 0.5000 0.0189OCT 0.9592 0.7009 0.0212 0.0140 0.9957 0.5316 0.0000 0.0000RF 0.9760 0.8090 0.0352 1.0000 0.9958 0.6020 0.0000 0.0000

GBM 0.9760 0.7868 0.0915 0.3051 0.9958 0.7500 0.0000 0.0000

We observe a promising performance for the Healthcare sector, testing out on 5698 flights, 142of which labeled as related to M&A, made in the last three months of 2017. The most conservativemodel, Random Forest, attained the highest accuracy and AUC rates. It has identified 5 out of 142M&A related flights, while not getting any positives wrong. GBM found 18 such flights, but created41 false positives. Again, logistic regression recovered most of positives, 43 M&A related flights,but at a cost of 939 false positives. Thus, the model configuration developed for the InformationTechnology sector shows signs of being transferable to other industries.

8

Page 9: Analytics Lab Accenture Team Report

However, some industries are harder to predict than others. Especially, if the firm behaviourentails lots of flights and little M&A deals. For example, testing out on the financial sector, buildinga model using a larger time-span, 2013 - 2018, and testing on the last 9 months of 2019, we observethat Random Forest and GBM models never predict a positive outcome. While their performance isstill better than the baseline, AUCs are above 0.5, the probabilities of belonging to a positive classare too low for the model to feel confident contributing it to an M&A. In this case, an optimisticlogistic regression performs much better than more advanced techniques.

3.5 Implications

Our results show that several of the variables we engineered correlate with increased likelihoodfor a merger or acquisition in the next three months. These variables reinforce the common senseintuition that the way a company uses its jets can be used to identify corporate activity. Forinstance, logistic regression identified variables like the proportion of flights from a company to alocation in the last six months, the number of flights flown to a destination in the last six months,and the proportion of flights to a location in the last year as the most important features (1). Thisis exactly what we expected to see based on discussions with investing professionals from EatonVance Management. The proportion of flights to a destination over six months indicates the moreimmediate focus on that location while the year-long proportion of flights to that location providesan indication of its long-term importance. Taken together, we would expect a difference betweenthe two variables to highlight changes in the company’s focus that might indicate a merger oracquisition in that location.

While logistic regression performs well, the Random Forest method identifies additional vari-ables of interest including the duration of an aircraft’s stay at its final destination and the numberof flights to a particular location in the last month (1). Given that executives typically live neartheir company headquarters, longer stays indicate activity that is important enough to keep themaway from home for an extended period of time. While other explanations are possible, commonsense argues that significant business deals like mergers and acquisitions take lots of time to workout, which requires longer duration stays. Similarly, a large number of flights to a location inthe past month indicates higher executive presence in that area which could indicate a significantdeal being made. While each of these factors on their own could merely be the result of otherevents, taken together they begin to form a powerful prediction tool for identifying mergers andacquisitions.

Our best performing method, Gradient Boosted Machines, heavily uses a set of variables mea-suring the short-term activity into a location (2). Essentially, the ”Peak Last 6 Weeks” and ”PeakLast 4 Weeks” variables indicate if the flight traffic into the location has set a new record for com-pany traffic into that region. A company flying into an area more than they ever have in the pastpoints clearly towards abnormal activity occurring. While trade conferences or other corporateevents might draw a yearly burst of flights that could fool other variables which only consider sixmonths or a year in the past, this variable accounts for all of the company’s past behavior. Thus,identifying these surge moments allows our models to distinguish normal spikes in flight activityfrom significant new bursts.

9

Page 10: Analytics Lab Accenture Team Report

4 Conclusion

Predicting that a company will be part of an M&A transaction in a specific city is valuableinformation for Hedge Funds. Indeed, experts can manually inspect the output of our prediction,since the number of false positive is reasonable for the better performing models, and go from thecity where the transaction will happen to the exact company. This could be typically performedby looking at the recent performances of companies in the same sector in the geographical area,since under-performing businesses tend to sell. Moreover, public comments, strategic reviews, andpublicly disclosed earning reports are a great source of information. Finally, Bloomberg is also arelevant tool to identify which is the company on the other side of the transaction since it providesthe overlap between companies, which would give some powerful insights on strategic fit.

We would like to emphasize that the prediction model has an interesting business value. Mergerannouncements typically involve a large premium over current prices (between 40% and 50% onaverage using ) and lead to a large and rapid change in market prices. Therefore, any improvementin the ability to predict which firms will be involved in a merger deal would prove to be veryprofitable for an investor in the stock market.

10

Page 11: Analytics Lab Accenture Team Report

Appendix

4.1 List of the independent variables used across all models

• Duration of stay in seconds

• Month when the flight arrived

• Number of flights by this company to this destination in the last 1, 3, 6 and 12 months

• Number of flights by this company in the last 1, 3, 6 and 12 months

• Proportion of flights by this company to this destination in the last 1, 3, 6, 12 months

• Number of flights by all companies in the data set to this destination in the last 1, 3 and 6months

• Whether the company flew to this destination for the first time in the last 1 and 3 months

• Whether there has been a peak of flights in the last 2, 4 and 6 weeks

• Whether the flight has departed close to the company’s HQ

4.2 Variable importance

Figure 3: Prediction Tree for Information Systems 2013

11

Page 12: Analytics Lab Accenture Team Report

Table 1: Logistic Regression and Random ForestVariables Logistic Coeff Random Forest Importance

(Intercept) -1.57 –Stay duration in seconds -0.00 4.91 §#flights from company to destination last month 0.09* 1.30#flights from company to destination last 3 months 0.04 0.44#flights from company to destination last 6 months -0.22* 0.89#flights from company to destination last year 0.13* 0.82#flights from company anywhere last month -0.03 1.23#flights from company anywhere last 3 months 0.01 3.70 §#flights from company anywhere last 6 months -0.01 2.12#flights from company anywhere last year 0.01 2.17% flights from company to destination last month -5.80 1.79% flights from company to destination last 3 months -29.43 3.43 §% flights from company to destination last 6 months 226.63* 7.78 §% flights from company to destination last year -208.03* 3.72 §Flew there 1st time in last month -0.92* 1.06Flew there 1st time in last 3 months 1.11 0.63Flew there 1st time in last 6 months -0.97 0.07#flights to destination from all companies last month 0.01 5.94 §#flights to destination from all companies last 3 months -0.00 2.98#flights to destination from all companies last 6 months 0.01* 10.20 §Was there a peak of flights in last 2 weeks 0.00* 5.07 §Was there a peak of flights in last 4 weeks 0.00 2.68Was there a peak of flights in last 6 weeks -0.00 2.12Month -0.13 0.54Was the flight from HQ of the acquirer -1.48* 0.03

* indicates significant variables at the 5% confidence levelbased on p-values from the Logistic Regression§ indicates 8 most important features in Random Forestbased on mean decrease in Gini measure

4.3 Future Work

• Build a second model that predicts which firm the flying company is likely to buy.

• Apply a hierarchical approach that combines the predictions of previous models (Logistic,OCT, Random Forest, etc.) to give a more nuanced prediction of the probability of a merger

• Subset analysis to only flights with a location near the headquarters of another publicly tradedcompany, potentially removing many random one-time flights

• Incorporate flight data from multiple companies in each prediction to increase accuracy byidentifying M&A indicators (like movement of banks to the area) versus noise (like multiplecompanies travelling to a trade show at the same time)

12

Page 13: Analytics Lab Accenture Team Report

Table 2: Gradient Boosted MachinesVariables Gradient Boosted Machines importance

Stay duration in seconds 13.69 §#flights from company to destination last month 0.81#flights from company to destination last 3 months 0.36#flights from company to destination last 6 months 0.52#flights from company to destination last year 1.21#flights from company anywhere last month 4.02#flights from company anywhere last 3 months 10.00 §#flights from company anywhere last 6 months 6.14 §#flights from company anywhere last year 10.09 §% flights from company to destination last month 5.60 §% flights from company to destination last 3 months 8.42 §% flights from company to destination last 6 months 10.65 §% flights from company to destination last year 7.68 §Flew there 1st time in last month 0.52Flew there 1st time in last 3 months 0.24Flew there 1st time in last 6 months 0.34#flights to destination from all companies last month 0.88#flights to destination from all companies last 3 months 3.77#flights to destination from all companies last 6 months 4.01Was there a peak of flights in last 2 weeks 3.50Was there a peak of flights in last 4 weeks 2.17Was there a peak of flights in last 6 weeks 1.79Month 2.84Was the flight from HQ of the acquirer 0.75

§ indicates 8 most important features in GBMbased on mean computed relative influence usingBreiman (2001) methodology

4.4 Formula for Distance on a Sphere

a = sin(latheadquarter − latairport

2)2+cos(latairport) cos(latheadquarter) sin(

lonheadquarter − lonairport

2)2

c = 2 atan(

√a√

1− a)

distance = R * c

where R approximate radius of Earth in km.

13