Designing States, Actions, and Rewards for Using POMDP in Session Search

29
DESIGNING STATES, ACTIONS, AND REWARDS FOR USING POMDP IN SESSION SEARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department of Computer Science Georgetown University {jl1749,sz303,xd47}@georgetown.edu [email protected] 1

Transcript of Designing States, Actions, and Rewards for Using POMDP in Session Search

Page 1: Designing States, Actions, and Rewards for Using POMDP in Session Search

DESIGNING STATES, ACTIONS, AND REWARDS FOR USING POMDP IN SESSION SEARCH �

Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang

InfoSense Department of Computer Science

Georgetown University

{jl1749,sz303,xd47}@georgetown.edu

[email protected]

1

Page 2: Designing States, Actions, and Rewards for Using POMDP in Session Search

2

E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport.

DYNAMIC IR- A NEW PERSPECTIVE TO LOOK AT SEARCH�

Information need

User

Search Engine

Page 3: Designing States, Actions, and Rewards for Using POMDP in Session Search

3

¢ Trial-and-error

CHARACTERISTICS OF DYNAMIC IR �

3 ¢  q1 – "dulles hotels" ¢  q2 – "dulles airport"

¢  q3 – "dulles airport location” ¢  q4 – "dulles metrostop"

Page 4: Designing States, Actions, and Rewards for Using POMDP in Session Search

4

¢ Rich interactions �  Query formulation �  Document clicks �  Document examination �  eye movement �  mouse movements �  etc.

4

CHARACTERISTICS OF DYNAMIC IR �

Page 5: Designing States, Actions, and Rewards for Using POMDP in Session Search

5

¢ Temporal dependency

5

CHARACTERISTICS OF DYNAMIC IR �

clicked documents query

D1 ranked documents

q1 C1

D2

q2 C2 ……

…… Dn

qn Cn

I informa(on  need

itera(on  1 itera(on  2 itera(on  n

Page 6: Designing States, Actions, and Rewards for Using POMDP in Session Search

6

¢  Fits well in this trial-and-error setting

¢  It is to learn from repeated, varied attempts which are continued until success.

¢  The learner (also known as agent) learns from its dynamic

interactions with the world �  rather than from a labeled dataset as in supervised learning.

¢  The stochastic model assumes that the system's current state depend on the previous state and action in a non-deterministic manner

REINFORCEMENT LEARNING (RL) �

6

Page 7: Designing States, Actions, and Rewards for Using POMDP in Session Search

PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP)

7

…… s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

�  Hidden states �  Actions �  Rewards

1R. D. Smallwood et. al., ‘73

o1 o2 o3

7

�  Markov �  Long Term Optimization �  Observations, Beliefs

Page 8: Designing States, Actions, and Rewards for Using POMDP in Session Search

8

8

Study designs of states, actions, reward functions of RL algorithms in Session Search

GOAL OF THIS PAPER �

Page 9: Designing States, Actions, and Rewards for Using POMDP in Session Search

A MARKOV CHAIN OF DECISION MAKING STATES

[Luo, Zhang, and Yang SIGIR 2014]

9

Page 10: Designing States, Actions, and Rewards for Using POMDP in Session Search

10

¢ Partially Observable Markov Decision Process

¢ Two agents �  Cooperative game �  Joint Optimization

WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC GAME

�  Hidden states �  Actions

�  Rewards �  Markov

[Luo, Zhang, and Yang SIGIR 2014]

Page 11: Designing States, Actions, and Rewards for Using POMDP in Session Search

11

¢  A tuple (S, M, A, R, γ, O, Θ, B) �  S : state space �  M: state transition function �  A: actions �  R: reward function �  γ: discount factor, 0< γ ≤1 �  O: observations a symbol emitted according to a hidden state. �  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). �  B: belief space Belief is a probability distribution over hidden states.

PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP) �

1R. D. Smallwood et. al., ‘73

Page 12: Designing States, Actions, and Rewards for Using POMDP in Session Search

12

SRT Relevant &

Exploitation

SRR Relevant & Exploration

SNRT Non-Relevant & Exploitation

SNRR Non-Relevant & Exploration

�  scooter price ⟶    scooter stores

�  collecting old US coins⟶  selling old US coins

�  Philadelphia NYC travel ⟶  Philadelphia NYC train

�  Boston tourism ⟶ NYC tourism

q0

HIDDEN DECISION MAKING STATES

[Luo, Zhang, and Yang SIGIR 2014]

Page 13: Designing States, Actions, and Rewards for Using POMDP in Session Search

ACTIONS �  User Action (Au)

¢  add query terms (+Δq) ¢  remove query terms (-Δq) ¢  keep query terms (qtheme)

�  Search Engine Action(Ase) ¢  Increase/ decrease/ keep term weights ¢  Switch on or off a search technique,

¢  e.g. to use or not to use query expansion ¢  adjust parameters in search techniques

¢  e.g., select the best k for the top k docs used in PRF

�  Message from the user(Σu) ¢  clicked documents ¢  SAT clicked documents

�  Message from search engine(Σse) ¢  top k returned documents

Messages are essentially documents that an agent thinks are relevant.

[Luo, Zhang, and Yang SIGIR 2014]

13

Page 14: Designing States, Actions, and Rewards for Using POMDP in Session Search

¢ Based on Markov Decision Process (MDP) ¢ States: Queries

�  Observable

¢ Actions: �  User actions:

¢  Add/remove/ unchange the query terms ¢  Nicely correspond to our definition of query change

�  Search Engine actions: ¢  Increase/ decrease /remain term weights

¢ Rewards: �  nDCG

14

[Guan, Zhang, and Yang SIGIR 2013] 2ND MODEL: QUERY CHANGE MODEL

Page 15: Designing States, Actions, and Rewards for Using POMDP in Session Search

SEARCH ENGINE AGENT’S ACTIONS ∈ Di−1 action Example

qtheme

Y increase “pocono mountain” in s6

N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction

+∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck

lobbyists US policy

N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy

−∆q Y decrease

‘reaction’ in s28, france world cup 98 reaction → france world cup 98

N No change

‘legislation’ in s32, bollywood legislation →bollywood law

15 [Guan, Zhang, and Yang SIGIR 2013]

Page 16: Designing States, Actions, and Rewards for Using POMDP in Session Search

QUERY CHANGE RETRIEVAL MODEL (QCM)

¢ Bellman Equation gives the optimal value for an MDP:

¢ The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation:

16

V*(s) = maxa

R(s,a) + γ P(s' | s,a)s '∑ V*(s')

Score(qi, d) = P (qi|d) + γ P (qi|qi-1, Di-1, a)maxDi−1

P (qi-1|Di-1)a∑

Document relevant score

Query Transition

model

Maximum past

relevance Current reward/

relevance score

[Guan, Zhang, and Yang SIGIR 2013]

Page 17: Designing States, Actions, and Rewards for Using POMDP in Session Search

CALCULATING THE TRANSITION MODEL

)|(log)|(

)|(log)()|(log)|(

)|(log)]|(1[+ d)|P(q log = d) ,Score(q

*1

*1

*1ii

*1

*1

dtPdtP

dtPtidfdtPdtP

dtPdtP

qti

dtqt

dtqt

i

qthemeti

ii

∑∑

Δ−∈−

∉Δ+∈

∈Δ+∈

∈−

+−

−−

δ

εβ

α

17

•  According to Query Change and Search Engine Actions

Current reward/ relevance score

Increase weights for theme terms

Decrease weights for

removed terms

Increase weights for novel added

terms

Decrease weights for old added

terms

[Guan, Zhang, and Yang SIGIR 2013]

Page 18: Designing States, Actions, and Rewards for Using POMDP in Session Search

RELATED WORK

18

¢ Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Balancing exploration and exploitation in learning to rank online. In ECIR'11.

¢ Xiaoran Jin and Marc Sloan, and Jun Wang. Interactive exploratory search for multi page search results. In WWW '13

¢ Xuehua Shen, Bin Tan, and Chengxiang Zhai. Implicit user modeling for personalized search. In CIKM '05

¢ Norbert Fuhr. A Probability Ranking Principle for Interactive Information Retrieval. In IRJ, 11, 3, 2008

18

Page 19: Designing States, Actions, and Rewards for Using POMDP in Session Search

STATE DESIGN OPTIONS

¢  (S1) Fixed number of states �  use two binary relevance states

¢  “Relevant” or “Irrelevant”

�  use four states ¢  whether the previously retrieved documents are relevant ¢  whether the user desires to explore

¢  (S2) Varying number of states �  model queries as states, n queries è n states

�  infinity states ¢  document relevance score distribution as states.

¢  one document corresponds to one state

19

Page 20: Designing States, Actions, and Rewards for Using POMDP in Session Search

ACTION DESIGN OPTIONS

¢  (A1) Technology Selection �  a meta-level modeling of actions

¢  implement multiple search methods, and select the best methods for each query

¢  Select the best parameters for each method

¢  (A2) Term Weight Adjustment �  adjusted term weights

¢  (A3) Ranked List �  One possible ranking of a list of documents is one

single action ¢  If the corpus size is N and the retrieved document number

is n, then the size of the action space is: 20

PNn = N(N −1)...(N − n+1) = N!

(N − n)!

Page 21: Designing States, Actions, and Rewards for Using POMDP in Session Search

REWARD FUNCTION DESIGN OPTIONS

¢  (R1) Explicit Feedback �  Rewards generated from user’s relevance assessments.

¢  nDCG, MAP, etc

¢  (R2) Implicit Feedback �  Use implicit feedback obtained from user behavior

¢  Clicks, SAT clicks

21

Page 22: Designing States, Actions, and Rewards for Using POMDP in Session Search

SYSTEMS UNDER COMPARISON

¢  Luo, et al. Win-Win Search: Dual-Agent Stochastic Game in Session Search. SIGIR’14

¢  Zhang, et al. A POMDP Model for Content-Free Document Re-ranking. SIGIR’14

¢  Guan, et al. Utilizing Query Change for Session Search. SIGIR’13

¢  Shen, et al. Implicit user modeling for personalized search. CIKM '05

¢  Jin, et al. Interactive exploratory search for multi page search results. WWW '13

22

S1A1R1(win-win)

S1A3R2

S2A2R1(QCM)

S2A1R1(UCAIR)

S2A3R1(IES)

S1A1R2

S1A2R1

S2A1R1

Page 23: Designing States, Actions, and Rewards for Using POMDP in Session Search

EXPERIMENTS�¢  Evaluate on TREC 2012 and 2013 Session Tracks

�  The session logs contain ¢  session topic ¢  user queries ¢  previously retrieved URLs, snippets ¢  user clicks, and dwell time etc.

�  Task: retrieve 2,000 documents for the last query in each session �  The evaluation is based on the whole session. Metrics include:

¢  nDCG@10, nDCG, nERR@10 and MAP ¢  Wall Clock Time, CPU cycles and the Big O notation

23

¢  Datasets �  ClueWeb09 CatB �  ClueWeb12 CatB �  spam documents are

removed �  duplicated documents

are removed

Page 24: Designing States, Actions, and Rewards for Using POMDP in Session Search

EFFICIENCY VS. # OF ACTIONS ON TREC 2012

24 ¢  When number of actions increases, efficiency tends to drop dramatically

¢  S1A3R2, S1A2R1, S2A1R1(UCAIR), S2A2R1(QCM) and S2A1R1 are efficient

¢  S1A1R1(win-win) and S1A1R2 are moderately efficient

¢  S2A3R1(IES) is the slowest system

Page 25: Designing States, Actions, and Rewards for Using POMDP in Session Search

ACCURACY VS. EFFICIENCY

25

TREC 2012 TREC 2013

¢  Accuracy tends to increase when efficiency decreases ¢  S2A1R1(UCAIR) strikes a good balance between accuracy

and efficiency ¢  S1A1R1(win-win) gives impressive accuracy with a fair

degree of efficiency

Page 26: Designing States, Actions, and Rewards for Using POMDP in Session Search

OUR RECOMMENDATION

26

¢  If focus on accuracy

¢  If time limit is

within one hour

¢  If want the balance of accuracy and efficiency

v Note: number of actions heavily effect efficiency which need to be carefully designed

Page 27: Designing States, Actions, and Rewards for Using POMDP in Session Search

CONCLUSIONS

¢ POMDPs are good for session search modeling �  Information seeking behaviors

¢ Design questions �  States: What changes with each time step? �  Actions: How does our system change the state? �  Rewards: How can we measure feedback or

effectiveness?

¢  It is something between an Art and Empirical Experiments

¢ Balance between efficiency and accuracy

27

Page 28: Designing States, Actions, and Rewards for Using POMDP in Session Search

RESOURCES

¢  Infosense �  http://infosense.cs.georgetown.edu/

¢ Dynamic IR Website �  Tutorials : http://www.dynamic-ir-modeling.org/

¢ Live Online Search Engine – Dumpling �  http://dumplingproject.org

¢ Upcoming Book �  Dynamic Information Retrieval Modeling

¢ TREC 2015 Dynamic Domain Track �  http://trec-dd.org/ �  Please participate, if you are interested in

interactive, and dynamic search 28

Page 29: Designing States, Actions, and Rewards for Using POMDP in Session Search

THANK YOU

29

InfoSense Georgetown University

[email protected]