Modeling Airline Fares - École nationale de l'aviation...
Transcript of Modeling Airline Fares - École nationale de l'aviation...
Modeling Airline FaresEvidence from the U.S. Domestic Airline Sector
Domingo Acedo GomezArturs LukjanovicsJoris van den Berg
31 January 2014
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Motivation and Main Findings
Which Factors Influence Fares?
Distance
Competition
Seasonality
Carrier
Ticket class
Economic situation
Total passengers
Hubs
Our Model Results
22 explanatory factors included
Overall accuracy of 50% of fare variation
Adding Southwest increases accuracy to 55%
ENAC Modeling Airline Fares 31 January 2014 2 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
DB1B Origin & Destination Database
Itinerary ID
200911307517
Main Information
Coupons
Carrier
Breaks
Itinerary $ fare
Fare class
ENAC Modeling Airline Fares 31 January 2014 3 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
A Look Inside the Database
ENAC Modeling Airline Fares 31 January 2014 4 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Reducing the Database
Which tickets do we keep for our study?
1 Round trip
2 Two or four coupons
3 Single carrier
4 No extreme fares
5 Economy class
6 Main majors & lowcost carriers
7 Regular routes
8 Lower 48 states
ENAC Modeling Airline Fares 31 January 2014 5 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Data Flow
ENAC Modeling Airline Fares 31 January 2014 6 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Final Dataset
128,192 Observations
65% Direct Flights35% Indirect Flights
ENAC Modeling Airline Fares 31 January 2014 7 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Descriptive Statistics
ENAC Modeling Airline Fares 31 January 2014 8 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Descriptive Statistics
ENAC Modeling Airline Fares 31 January 2014 9 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Model Results
Dependent Variable: ln(AVGWEIGHTEDFARE)Method: Least SquaresSample (adjusted): 1 128191Included observations: 128191 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
C 42.99172 0.338122 127.1484 0.0000AVERAGEROUTEPOPULATION/1000000 0.007107 0.000342 20.79077 0.0000TOTALPAX/1000 -0.099927 0.001948 -51.29007 0.0000(TOTALPAX/1000)2 0.008593 0.000337 25.49255 0.0000DISTANCE/1000 0.610302 0.005283 115.5298 0.0000(DISTANCE/1000)2 -0.107413 0.002108 -50.94444 0.0000H-INDEX/0.1 0.012098 0.000351 34.46632 0.0000CLASSRATIO 0.752086 0.033692 22.32214 0.0000CLASSRATIO2 -1.165174 0.023031 -50.59143 0.0000FREQAIRPORT=1 0.096143 0.002240 42.92069 0.0000DIRECT=1 -0.019687 0.002006 -9.815180 0.0000CARRIER=”AS” -0.271279 0.005801 -46.76208 0.0000CARRIER=”FL” -0.331333 0.004582 -72.30864 0.0000CARRIER=”AA” -0.044150 0.003071 -14.37402 0.0000CARRIER=”DL” -0.026154 0.003019 -8.664515 0.0000CARRIER=”NW” 0.115003 0.003405 33.77637 0.0000CARRIER=”UA” 0.026226 0.003207 8.178843 0.0000CARRIER=”US” -0.063643 0.003238 -19.65542 0.0000CARRIER=”9E” -0.216606 0.270274 -0.801433 0.4229CARRIER=”B6” -0.543833 0.048076 -11.31193 0.0000CARRIER=”WN” -0.168659 0.005076 -33.22809 0.0000CARRIER=”EV” 0.016828 0.035035 0.480301 0.6310YEAR -0.018556 0.000169 -109.6910 0.0000
R-squared 0.499430 Mean dependent var 6.056327Adjusted R-squared 0.499344 S.D. dependent var 0.381934S.E. of regression 0.270246 Akaike info criterion 0.221208Sum squared resid 9360.448 Schwarz criterion 0.222959Log likelihood -14155.42 Hannan-Quinn criter. 0.221733F-statistic 5812.541 Durbin-Watson stat 1.858848Prob(F-statistic) 0.000000
ENAC Modeling Airline Fares 31 January 2014 10 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Model Performance
Predicted fare 1993-2010
ENAC Modeling Airline Fares 31 January 2014 11 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Including Southwest
Dependent Variable: ln(AVGWEIGHTEDFARE)Method: Least SquaresSample (adjusted): 1 148125Included observations: 148054 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
C 35.42660 0.318410 111.2608 0.0000AVERAGEROUTEPOPULATION/1000000 0.009141 0.000317 28.81292 0.0000TOTALPAX/1000 -0.105992 0.001664 -63.69786 0.0000
(TOTALPAX/1000)2̂ 0.007886 0.000262 30.04379 0.0000DISTANCE/1000 0.675759 0.004947 136.6076 0.0000
(DISTANCE/1000)2̂ -0.132959 0.001977 -67.25279 0.0000H-INDEX/0.1 0.015500 0.000325 47.64406 0.0000CLASSRATIO 0.865604 0.030551 28.33299 0.0000
CLASSRATIO2̂ -1.108471 0.021028 -52.71497 0.0000FREQAIRPORT=1 0.109104 0.002021 53.98739 0.0000DIRECT -0.037399 0.001863 -20.07095 0.0000CARRIER=”AS” -0.250315 0.005753 -43.51145 0.0000CARRIER=”FL” -0.329620 0.004549 -72.45734 0.0000CARRIER=”AA” -0.036666 0.003045 -12.04244 0.0000CARRIER=”DL” -0.023884 0.002990 -7.986586 0.0000CARRIER=”NW” 0.116872 0.003378 34.59792 0.0000CARRIER=”UA” 0.031257 0.003181 9.827048 0.0000CARRIER=”US” -0.034285 0.003193 -10.73724 0.0000CARRIER=”9E” -0.186904 0.268389 -0.696391 0.4862CARRIER=”B6” -0.457841 0.047695 -9.599321 0.0000CARRIER=”WN” -0.438353 0.003345 -131.0549 0.0000CARRIER=”EV” 0.070191 0.034784 2.017924 0.0436YEAR -0.014885 0.000159 -93.48576 0.0000
R-squared 0.545766 Mean dependent var 5.999173Adjusted R-squared 0.545698 S.D. dependent var 0.398153S.E. of regression 0.268363 Akaike info criterion 0.207202Sum squared resid 10660.99 Schwarz criterion 0.208741Log likelihood -15315.57 Hannan-Quinn criter. 0.207661F-statistic 8084.567 Durbin-Watson stat 1.835690Prob(F-statistic) 0.000000
ENAC Modeling Airline Fares 31 January 2014 12 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Model Performance with Southwest
Predicted fare 1993-2010
ENAC Modeling Airline Fares 31 January 2014 13 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Forecasting
Southwest enters a new route! LAS - ORD
Las Vegas - Chicago
Population: 4,296,645
B737-700: 140 PAX,
2 × week, 13 weeks, 3,640 (364)
Currently 4 carriers
Distance: 1600NM
90% restricted class tickets
Direct flight
LAS is a ”frequent” airport
ENAC Modeling Airline Fares 31 January 2014 14 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
A Reality Check (Booking LAS - ORD)
ENAC Modeling Airline Fares 31 January 2014 15 / 16
Introduction Generating the Dataset Descriptive Statistics Modeling Forecasting Conclusions
Conclusion
We...
Processed 122 GB of DB1B data with Python
Constructed an econometric model with 22 variables
Were able to capture 55% of the observed fare variation
ENAC Modeling Airline Fares 31 January 2014 16 / 16