Download - The end of Airline Revenue Management as we … › resources › Documents...Strictly CONFIDENTIAL The end of Airline Revenue Management as we know it? (Deep) Reinforcement Learning

Str

ictl

y

CONFIDENTIAL

The end of Airline Revenue Management

as we know it?(Deep) Reinforcement Learning for Revenue Management

AGIFORS Symp 2017, London

October 2017Speakers: Rodrigo ACUNA AGOST and Thomas FIIG

© A

ma

de

us

IT G

rou

p a

nd

its

aff

ilia

tes

an

d s

ub

sid

iari

es

INR and Airlines R&D

Credits

Nicolas BONDOUX

Quan NGUYEN

Thomas FIIG

Rodrigo ACUNA-AGOST

RM

Expertise

AI

Expertise

Amadeus Airlines R&D + Innovation and Research

(Deep)

Reinforcement

Learning for

Revenue

Management

Motivation – limitations of RMSRMS assumptions

− RMS assumes that the future is accurately described by the past:

• Issue with change in business environment (new competitors)

• Issue with shift in demand and willingness to pay

• Issue with change in customer behavior (for example: arrival pattern)

− RMS assumes that customers are rational:• However, customers are irrational, influenced by psychological factors (framing, etc.).• There is no model for irrationality.

− RMS assumes monopoly:

• Competitors offers are accounted for implicitly by how they affect customers behavior. This corresponds to a monopoly seen from RMS

− RMS assumes that a model exist that describes “world”. For general offers this is impossible:

• Increased complexity of offers (seat + ancillaries) • Complex products (flexibility, time to think, etc.), bundles of ancillaries; are difficult to price. • Interactions between the prices of ancillaries, bundles, fare families, etc.

Scientific worldview Ability to model customers behavior?

• Collect observations

• Experimentation

YesNo

Low dimensions

(degrees of freedom)

High dimensions

(degrees of freedom)

Optimization - Multiple sequential decisions

• Dynamic Programming (Bellman 1950s)

• Balancing exploitation and

exploration

Build forecasting model

• All future customers

Iterative Solution Complete

solution=

Proof in 1990s

Classification

Prediction

Build forecasting model

• One given customer

REINFORCEMENT

LEARNING

CLASSICAL

REVENUE MANAGEMENT

SUPERVISED

MACHINE LEARNING

…Many

iterations

5

Application of RL

6

(1) Actions

(2) State

(3) Reward

How it works?

7

Actions: { Left, no-change, Right }

Reward = stay alive as long as possible

(Alive = no crash)

State: { Information of Sensors }

Sensors …

8

© A

ma

de

us

IT G

rou

p a

nd

its

aff

ilia

tes

an

d s

ub

sid

iari

es

So, what’s new?

… it is very disruptive

No modeling

passenger

behavior (WTP)

No demand

forecast

No RM

optimization

model

Reinforcement LearningReinforcement LearningReinforcement LearningReinforcement LearningMathematical details*)

� �, � = �� 1 − �( ) � � + 1, � + �( ) + � � + 1, � − 1� � = �� [�� + �(�′)]

��

Bellman (1950s)

Q �, � = � �� [�� + �(�′)]��

Q-learning Watkins (1989)

a1V(s)

a2

V(s’)

max

f1

0

f2

0

f

0

(s,a1)

(s,a2)

Q(s,a)

V(s’)a’1

a’2

a’1

a’2

max

maxV(s)

a

Explorative

action

*) Reinforcement Learning, Sutton, Brato, 1998

Q s, a ← [1 − α]Q s, a + α f + V(s�)

� � = �� "(�, �)

10

© A

ma

de

us

IT G

rou

p a

nd

its

aff

ilia

tes

an

d s

ub

sid

iari

es

Experimentation

our research journey …

Base Reinforcement Learning

Scenario: Monopoly

+ GPUs

+ Scenario:

with 1 competitor

+ Deep Learning

1

2

3

High

Low

Co

mp

lexi

ty /

Re

ali

sm

11

Experiments 1

Base Reinforcement Learning in a Monopoly

AAA BBB

AL1

Simulation set-up

Cap = 10

DCP = 20

Fare classes = 3

Fenceless fare structure

RMS basecase

• AL1: Dynamic Programming

• Two customer segments with different frat5

• Forecaster = Q-forecasting

Reinforcement Learning

• No Forecaster or Optimizer

• AL1: Q –learning

• State (t,x)

• Action: f1,f2,f3, closed

12

60%

65%

70%

75%

80%

85%

90%

95%

100%

105%

110%

0 7

13

19

25

32

38

44

51

57

63

70

76

82

88

95

101

107

114

120

126

133

139

145

152

158

164

170

177

183

189

196

202

208

215

221

227

233

240

246

252

259

265

271

278

284

290

296

Revenue

Data (Years)

Theoretical optimum

=> perfect demand forecast, perfect WTP

models, no changes in the market

Revenue with RL

Experiments 1


� RL can solve the problem.

� RL converges to the optimal solution

— Poor performance (RL needs a lot of

data and computation.

13

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 7 6 4 9 1 6 3 2 0 3 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 7 5 8 5 7 7 6 9 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 5 10 8 1 0 2 1 7 0 5 9 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 2 0 1 1 0 2 2 3 3 3 10 9 6 4 8 2 5 2 7 3 0 6 6 1 4 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

1 1 2 0 1 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 2 3 2 2 4 9 8 8 10 4 10 7 3 10 3 6 6 0 8 7 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

5 1 1 0 2 1 1 1 1 1 4 1 2 1 1 1 0 3 1 0 1 0 1 0 0 1 0 0 2 2 0 0 2 2 0 1 1 0 0 0 1 0 0 0 0 2 3 1 2 0 1 0 2 1 0 0 1 1 1 0 0 2 0 1 1 2 1 0 0 0 2 0 1 1 4 1 1 1 1 0 1 1 1 1 1 0 1 1 2 0 1 1 1 1 1 2 6 6 7 6 3 7 4 1 5 5 1 7 1 4 3 3 3 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

5 7 10 7 0 7 1 0 0 1 1 3 2 0 1 3 2 3 4 1 1 2 4 0 3 2 2 1 0 0 1 2 1 0 2 2 1 0 1 0 0 1 1 0 0 2 0 2 0 0 0 2 1 0 2 1 0 0 2 0 2 0 4 0 1 0 1 2 0 1 3 1 0 2 0 2 1 2 3 0 0 2 0 0 1 1 1 0 0 0 1 1 2 1 4 0 1 1 1 1 0 0 1 1 2 2 1 3 0 4 6 2 4 6 6 1 7 5 5 2 7 4 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

1 5 6 5 5 7 1 3 1 3 3 1 2 0 1 1 0 2 2 1 1 1 2 1 2 1 0 1 1 1 1 1 3 2 0 2 0 2 1 0 1 3 1 0 1 1 0 1 0 1 0 0 2 1 0 1 1 3 1 1 3 1 0 0 2 0 1 0 1 3 0 3 0 2 1 1 0 1 3 0 3 3 1 2 1 1 0 0 2 0 1 1 0 0 2 1 1 1 1 0 0 1 1 1 3 1 1 0 1 1 1 1 1 1 0 1 2 0 5 6 2 4 2 4 2 3 4 0 2 3 1 3 3 3 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

2 1 2 4 4 2 5 7 3 1 1 2 2 0 1 1 2 2 1 0 0 0 2 3 2 2 0 2 0 0 2 0 1 1 0 1 0 1 3 1 2 2 3 2 1 2 2 3 0 0 0 1 2 3 1 2 1 1 0 1 2 0 3 1 1 2 1 2 1 1 2 3 1 2 2 1 3 1 3 1 2 1 0 2 0 3 2 1 0 0 0 0 1 1 2 2 0 2 1 1 1 3 0 2 0 1 2 3 1 1 1 0 0 0 2 1 0 1 1 0 2 2 1 0 0 0 2 1 1 2 5 1 6 1 4 3 5 3 2 4 3 2 3 1 3 6 6 6 6 5 2 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

3 1 4 0 2 4 3 2 1 4 1 4 3 0 0 3 5 1 2 2 2 1 1 0 0 1 1 3 1 1 0 0 0 1 0 1 1 2 2 2 2 1 1 2 0 1 1 0 4 0 1 3 1 1 0 1 1 3 1 2 1 0 1 1 0 1 1 0 1 0 4 3 2 0 1 1 2 1 3 1 0 1 1 1 1 0 0 2 2 1 1 2 1 0 0 2 1 0 2 1 0 1 1 0 2 1 1 0 0 0 1 2 2 1 0 1 1 1 0 0 0 1 1 1 1 1 1 2 0 0 1 1 0 1 0 0 0 1 2 4 1 2 3 4 6 2 3 4 3 5 0 1 3 3 5 6 7 0 2 7 1 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

0 1 0 3 1 3 1 2 4 1 2 4 3 3 1 2 3 2 4 3 3 1 3 3 2 1 0 1 3 0 0 1 2 0 2 0 1 1 3 1 4 1 2 1 2 1 0 0 1 4 1 0 2 1 1 2 1 1 1 4 0 3 3 0 2 0 2 2 1 1 0 0 1 1 2 1 1 0 3 1 3 1 1 1 0 3 3 1 1 1 0 3 1 1 0 0 1 1 0 3 1 1 0 1 1 1 2 2 0 0 0 2 2 2 0 0 2 1 1 1 1 0 1 0 3 0 2 1 2 1 0 2 1 1 1 1 2 0 1 1 0 0 1 1 0 2 2 2 1 1 2 4 1 3 2 3 3 2 2 3 0 1 7 0 1 6 4 8 1 4 1 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

2 2 4 5 1 0 3 3 1 4 3 3 4 1 0 0 3 0 0 3 2 1 3 1 3 3 3 1 1 1 2 1 1 3 0 1 2 2 1 2 1 1 0 1 2 2 0 1 2 1 2 1 1 0 2 1 0 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 1 1 2 3 0 1 1 0 1 3 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 2 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 2 0 0 1 0 1 3 2 1 2 2 2 1 1 1 5 6 6 1 5 4 1 1 1 8 9 1 9 4 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

3 5 5 5 4 3 0 3 1 2 2 4 4 3 3 3 2 0 3 0 3 1 2 2 3 2 2 2 2 1 0 0 1 1 3 1 1 0 1 1 0 0 2 0 0 0 1 1 2 2 1 2 1 1 1 0 1 1 1 0 0 1 4 0 2 1 0 2 1 0 1 0 0 1 1 1 0 0 0 1 1 2 0 1 1 1 1 1 1 2 1 0 1 0 1 0 1 2 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 1 2 0 2 1 0 3 2 2 0 1 1 1 1 2 0 0 0 1 1 0 2 1 0 1 1 1 2 1 1 0 1 1 2 0 1 0 0 0 1 0 1 1 0 0 1 0 1 2 1 2 3 2 2 2 1 2 8 2 3 1 0 5 7 1 5 0 2 9 1 0 3 5 3 1 9 9 9 9 9 9 9 9

6 6 6 6 5 1 0 3 3 2 2 6 0 6 1 0 0 6 3 5 2 2 0 1 0 1 1 1 0 2 4 2 1 0 1 4 0 4 1 0 2 1 1 0 0 1 1 1 0 2 1 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 1 3 0 1 0 1 1 1 1 1 0 0 1 1 2 0 1 1 0 1 1 1 1 1 2 0 2 1 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 0 1 0 1 1 4 0 1 1 2 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 0 2 1 1 3 0 1 2 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1 6 3 7 0 0 1 1 7 1 0 8 6 9 8 9 9 8 3 0 3 1 5 6 9

6 6 6 6 1 6 6 5 1 4 6 0 6 3 5 2 6 1 1 2 2 2 3 2 1 1 2 1 1 0 1 0 0 3 2 1 1 1 2 0 1 2 0 2 2 0 1 0 1 0 0 1 1 0 2 0 2 0 0 0 1 0 0 0 1 1 2 0 1 1 1 1 0 1 1 1 1 0 0 0 1 0 0 2 0 0 0 0 0 1 0 3 1 0 0 1 1 0 0 2 1 0 0 0 0 0 2 1 0 1 0 0 1 1 3 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 0 2 0 1 1 1 1 1 1 0 1 0 1 0 0 2 0 1 0 1 1 1 0 0 1 1 0 1 1 0 2 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 1 2 0 1 1 1 0 0 5 0 8 4 4 5 3 3 3 0

7 7 7 7 7 7 7 7 5 0 3 7 4 4 2 1 6 7 7 4 5 1 6 4 5 6 6 2 1 2 0 3 1 0 0 0 0 3 3 2 0 1 0 2 1 0 2 1 1 0 2 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 2 0 1 0 0 0 0 0 1 1 0 1 1 0 2 1 1 3 2 0 1 1 0 2 0 0 0 2 1 0 1 0 2 0 1 1 0 1 0 0 3 1 1 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 2 0 0 0 0 1 2 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 2 2 2 2 1 4

7 7 7 7 7 7 7 7 7 7 7 4 7 7 7 2 4 4 0 1 2 6 7 2 7 6 7 5 3 6 7 2 2 1 4 2 1 2 0 1 0 0 0 2 1 2 1 2 2 1 1 1 0 0 1 0 0 1 1 2 0 1 1 0 2 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 0 2 1 1 0 1 2 2 1 0 1 1 1 2 0 1 0 1 0 1 2 1 0 1 1 1 0 0 0 1 2 1 0 2 0 0 1 3 1 1 1 1 1 2 0 0 0 0 1 1 1 0 0 0 2 0 1 0 1 2 1 0 1 1 2 0 1 1 2 2 0 0 1 0 1 1 0 2 0 0 1 0 2 1 1 2 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0

8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 0 0 1 2 1 4 2 7 8 6 2 7 4 3 5 3 3 0 3 1 0 3 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 2 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 2 0 1 2 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 2 1 1 0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0 2 1 0 1 0 1 2 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 2 8 2 4 4 6 2 8 1 1 2 3 1 2 2 2 3 1 0 1 1 1 1 0 2 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 2 1 0 0 1 3 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 1 1 3 0 0 0 1 1 2 1 1 0 0 0 1 1 2 0 0 0 2 1 2 2 0 1 1 1 0 2 1 0 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 2 0 0 1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0

BookingsEmpty Full

Tim

e t

o d

ep

art

ure

Departure

RL Policies vs. Optimal Policies

Lack of observations

(cabin full far from

departure)

Lack of observations

(cabin empty at

departure)

Good policies where we

have lots of

observations

Experiments 1


14

Deep Reinforcement LearningDeep Neural Network

Classical Artificial Neural Networks Deep Neural Network

Accuracy:

worse than other ML methods

Accuracy:

better than other ML methods

Input OutputInput OutputLayer1 Layer2 Layer3

man woman

2

15

Deep Reinforcement LearningDeep Neural Network as function approximation

What is function approximation?

(�, �) "(�, �)

(�, �) "(�, �, θ) = α� + β�θ(�, %) = α� + β%

(�, �) "(�, �, )

2

16

© A

ma

de

us

IT G

rou

p a

nd

its

aff

ilia

tes

an

d s

ub

sid

iari

es

40%50%60%70%80%90%100%110%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Revenue

Data (Year)

Deep RL

5-15Y

of

flight data

~99%

of

opt. rev

80%

85%

90%

95%

100%

105%

110%

33

39

45

50

56

62

68

73

79

85

91

96

102

108

114

119

125

131

137

142

148

154

160

Revenue

Data (1000 year)

Normal RL

135kY

of

flight data

~97%

of

opt. rev

Experiments + Deep Learning

2

17

Experiments 3

Reinforcement Learning in Duopoly

Simulation set-up

Cap =50

DCP=20

Fare classes = 10

Fenceless fare structure

RMS basecase

• AL1: Dynamic Programming

• A2: AT80

• Two customer segments with different frat5

• Estimated frat5 (optimal revenue)

• Forecaster = Q-forecasting

Reinforcement Learning

• No Forecaster or Optimizer

• AL1: Deep RL

• State (t,x)

• Action: f1,f2,f3,…,f10, closed

AAA BBB

AAA BBB

AL1

AL2

0,8

0,85

0,9

0,95

1

0,7 0,8 0,9 1 1,1 1,2 1,3

Re

ven

ue

Frat5 scaling factor

18

1 11 21 31 41 51 61 71 81

Months

Revenue

DQL AT80

1 11 21 31 41 51 61 71 81 91

Months

Revenue

RMS AT80

RMS vs Competitor

Experiments One Competitor + GRUs

3

100%

89%

80%

102.5%

DRL vs Competitor

AT80

RMS

AT80

DRL

Why is RL better?

0

500

1000

1500

2000

2500

3000

0 10 20 30 40 50

Low

est

ava

ila

ble

fa

re

RMS DRL

0

0.2

0.4

0.6

0.8

1

0 4 8 12 16 20

Bo

ok

ing

Cu

rve

RMS AT80

0

0.2

0.4

0.6

0.8

1

0 4 8 12 16 20

Bo

ok

ing

cu

rve

DRL AT80

DRL closes classes

far from departure

DRL open classes

close to departure

• Remember RMS were optimal

• DRL produces higher revenue by

understanding the competitive game and

swamping the competitors with low yield

passengers.

3

20

Conclusion

• More complex scenarios: � Full Networks

� Many competitors

� Pricing of complex product

� Pricing of psychological factors -

irrational customers.

� Shift in demands/WTP

• Improve learning performance

• Add more information to the state (eg.,

competitors and market)

today

future

• Classical RMS techniques are no longer

sufficient.

• RL opens the door to a radical new

approach:� Model free

� No forecasting

� No optimization

� Leans by direct price testing

• Shown that RL = RMS for monopoly

• We discover the richness of RL

• Beats RMS against competition