Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic...

26
1 Symbolic Symbolic Perseus Perseus : : a Generic a Generic POMDP Algorithm with POMDP Algorithm with Application to Dynamic Pricing Application to Dynamic Pricing with Demand Learning with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009

Transcript of Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic...

Page 1: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

1

Symbolic Symbolic PerseusPerseus: : a Generic a Generic POMDP Algorithm with POMDP Algorithm with

Application to Dynamic Pricing Application to Dynamic Pricing with Demand Learningwith Demand Learning

Pascal Poupart (University of Waterloo)

INFORMS 2009

Page 2: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

2

OutlineOutline

• Dynamic Pricing as a POMDP• Symbolic Perseus

– Generic POMDP solver– Point-based value iteration– Algebraic decision diagrams

• Experimental evaluation• Conclusion

Page 3: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

3

SettingSetting• One or several firms (monopoly or oligopoly)• Fixed capacity and fixed number of selling rounds

(i.e., sale of seasonal items)• Finite range of prices• Unknown and varying demand

• Question: how to dynamically adjust prices to maximize sales?

Page 4: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

4

POMDPsPOMDPs Formulation (monopoly)Formulation (monopoly)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Time Time Time Time

Firm

Con

sum

er

Page 5: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

5

POMDPsPOMDPs Formulation (oligopoly)Formulation (oligopoly)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Price-i Price-i Price-i Price-i

Sales Sales

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Firm

Com

petit

ors

Con

sum

er

Page 6: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

6

Unknown demand & competitorsUnknown demand & competitors

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Price-i Price-i Price-i Price-i

Sales Sales

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Firm

Com

petit

ors

Con

sum

er

Page 7: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

7

Demand ModelDemand Model• Probability that consumer chooses firm i:

Pr(CC=i) = eai+bipi

Σi eai+bipi + 1

• Parameters ai and bi are unknown• Learn them

– From historical data– As process evolves

Page 8: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

8

CompetitorsCompetitors• Model each competitor:

– Pricing strategy: inv/time price– Two thresholds: tup and tdown

• If inv/time < tup price↑

• If inv/time > tdown price↓

• Learn thresholds – From historical data– As process evolves

Page 9: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

9

Expanded POMDPExpanded POMDP

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Firm

Con

sum

er

Price-i Price-i Price-i Price-i

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Com

petit

ors

A, B A, B A, B A, B

T↑, T↓ T↑, T↓ T↑, T↓ T↑, T↓

Page 10: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

10

POMDPsPOMDPs• Partially Observable Markov Decision Processes

– S: set of states• Cross product of domain of all variables• |S| = ∏i |dom(Vi)| (exponentially large!)

– A: set of actions• {price↑, price↓, price unchanged}

– O: set of observations• Cross product of domain of observable variables

– T(s,a,s’) = Pr(s’|s,a): transition function• Factored rep: Pr(s’|s,a) = ∏i Pr(Vi|parents(Vi))

– R(s,a) = r: reward function• Sale = price x CC

Page 11: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

11

Belief monitoringBelief monitoring• Belief: b(s)

– Distribution over states

• Belief update: Bayes theorem– bao’(s’) = k Σs∈S b(s) Pr(s’|s,a) Pr(o’|a,s’)– bao’ = < o’, a, b >

• Demand learning and opponent modeling:– Implicit learning by belief monitoring

Page 12: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

12

Policy treesPolicy trees• Policy π

– Mapping from past actions & obs to next action– Tree representation

– Problem: tree grows exponentially with time

a1

a3a2

a7a6a5a4

o1 o2

o1 o2

o1 o2

Page 13: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

13

Policy OptimizationPolicy Optimization• Policy π : B A

– mapping from beliefs to actions

• Value function Vπ(b) = Σt γt Ebt|π [R]

• Optimal policy π*:– V*(b) ≥ Vπ(b) for all π,b

• Bellman’s Equation:– V*(b) = maxa Eb[R] + γ Σo’ Pr(o’|s,a) V*(bao’)

Page 14: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

14

DifficultiesDifficulties

• Exponentially large state space– |S| = ∏i |dom(Vi)|– Solution: algebraic decision diagrams

• Complex policy space– Policy π : B A– Continuous belief space– Solution: point-based Bellman backups

Page 15: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

15

Symbolic Symbolic PerseusPerseus

• Publicly available:– http://www.cs.uwaterloo.ca/~ppoupart/software.html

• Has been used to solve POMDPs with millions of states

• Currently used by– Intel, Toronto Rehabilitation Institute, Univ of Dundee,

Technical Univ of Lisbon, Univ of British Columbia, Univ of Manchester, Univ of Waterloo

Point-based value iteration

algebraic decision diagrams

+

Page 16: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

16

Piecewise linear & convex Piecewise linear & convex valval fnfn• Value of a policy tree β is linear

Vβ(b0) = Σs∈S b0(s) Vβ(s)

• Value of an optimal finite horizon policy is piecewise-linear and convex [SS73]

belief spaceb(s)=0 b(s)=1

Page 17: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

17

PointPoint--based value iterationbased value iteration• Point-based backup (Pineau & al. 2003)

αt-1(b) = maxa Eb[R] + γ Σo’ Pr(o’|s,a) αt(bao’)

b

VtVt-1

bao2bao1

Page 18: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

18

Algebraic Decision DiagramsAlgebraic Decision Diagrams• First use in MDPs: Hoey et al. 1999

• Factored Representation– Exploit conditional independence– Pr(s’|s,a) = ∏i Pr(Vi|parents(Vi))

• Automatic State aggregation– Exploit context specific independence– Exploit sparsity

Page 19: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

19

Factored RepresentationFactored Representation

• Transition fn: Pr(s’|s,a) – Flat representation: matrix O(|S|2)– Factored representation: often O(log |S|)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Time Time Time Time

Page 20: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

20

Computation with Factored RepComputation with Factored Rep

• Belief monitoring: – bao’(s’) = k Pr(o’|a,s’) Σs b(s) Pr(s’|s,a)

• Point-based Bellman backup:– α(s) = maxa R(s,a) + Σs’o’ Pr(s’|s,a) Pr(o’|a,s’) αao’(s’)

• Flat representation: O(|S|2)• Factored representation: often O(|S| log |S|)

Page 21: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

21

Algebraic Decision DiagramsAlgebraic Decision Diagrams

• Tree-based representation– Acyclic directed graph

• Avoid duplicate entries– Exploit context

specific independence– Exploit sparsity

2x~y~z0x~yz0xy~z0xyz

X

Y Y

Z

0 2

3

3~x~y~z3~x~yz2~xy~z0~xyz

Page 22: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

22

Empirical ResultsEmpirical Results• Monopolistic Dynamic Pricing

448199192817,92035 / 70

350199188605,12030 / 60

161198182424,32025 / 50

61187171275,52020 / 40

48167152158,72015 / 30

19 13312173,92010 / 20

Runtime (min)

Upper bound

SP Value|S|Inv / Time

Page 23: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

23

COACH projectCOACH project• Automated prompting system to help elderly persons

wash their hands• IATSL: Alex Mihailidis, Jesse Hoey, Jennifer Boger et al.

Page 24: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

24

Policy OptimizationPolicy Optimization

• Partially observable MDP:– Handle noisy HandLocation and noisy WaterFlow– Can adapt to user responsiveness– 50,181,120 states, 20 actions, 12 observations

• Approximation: fully observable MDP– Assume HandLocation, WaterFlow are fully observable– Remove responsiveness user variable– 25,090,560 states, 20 actions

Page 25: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

25

Empirical Comparison (Simulation)Empirical Comparison (Simulation)

Page 26: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

26

ConclusionConclusion• Natural encoding of Dynamic Pricing as POMDP

– Demand and competitor learning by belief monitoring– Factored model

• Symbolic Perseus (generic POMDP solvers)– Point-based value iteration + algebraic decision diagrams– Exploit problem specific structure

• Future work– Bayesian reinforcement learning– Planning as inference