Search Tasks, Proactive Search & Digital Assistants

47
ML Meetup Rishabh Mehrotra Motivation Search Tasks From Sessions to Tasks Multitasking User Groups Topical Heterogeneity Extracting Tasks & Subtasks Task Extraction Subtask Extraction Hierarchies of Tasks & Subtasks Search Tasks, Proactive Search & Digital Assistants Rishabh Mehrotra University College London [email protected] Machine Learning Meetup, Gurgaon Saturday 25 th June, 2016

Transcript of Search Tasks, Proactive Search & Digital Assistants

Page 1: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Search Tasks, Proactive Search & DigitalAssistants

Rishabh Mehrotra

University College [email protected]

Machine Learning Meetup, Gurgaon

Saturday 25th June, 2016

Page 2: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Modelling Search Tasks & Behaviors

1 Motivation

2 From Sessions to Tasks

3 Extracting Tasks & Subtasks

4 Hierarchies of Tasks & Subtasks

Page 3: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Evolution of Web Search

Page 4: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Search is Everywhere

Page 5: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Web Search Volume

Page 6: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Next Generation Systems

Towards Task-based SearchSystems

Page 7: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants

Page 8: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants

Page 9: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants

Information Cards:

From Queries to Cards (Shokouhi et al. SIGIR’15)Modelling User Interests for Zero-query Ranking (Yang etal. ECIR’16)

Intelligent Assistants:

Understanding User Satisfaction with Intelligent Assistants(Kiseleva et al. CHIIR’16)Automatic Online Evaluation of Intelligent Assistants(Jiang et al. WWW’15)

Page 10: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants

Modelling user interaction on Information Cards:

Click-based (pseudo) relevance labels may not beappropriate for evaluating all types of cards.

Features

Reactive HistoryProactive HistoryLexical/Topical FeaturesLocal/Temporal FeaturesConstant Features

Page 11: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants

Page 12: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Slight Detour - Proactive IR & Digital Assistants1

1Understanding User Satisfaction with Intelligent Assistants, Kiseleva etal. ACM CHIIR 2016

Page 13: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Putting it together

Digital Assistants - entry points for all web activities

Goal is to move towards Task Completion

Page 14: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Understanding Search Tasks

Search Tasks

An (atomic?) information need that may result in issuing ofone or more queries.

Image from Verma et al. 2014

Page 15: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Why Search Tasks

Image from Wang et al. 2014

Page 16: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Outline

Part 1 Understanding Search Sessions

Part 2 Extracting Tasks & Subtasks

Part 3 Hierarchies of Tasks & Subtasks

Page 17: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Understanding Search Behavior

IR research = sessions, topics, queries

We focus on:

1 Multitasking

2 User groups based on multitasking

3 Topical heterogeneity

Page 18: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Research Questions

1 RQ1: Extent of multi-tasking in search sessions?

2 RQ2: Evidence of user-level heterogeneity based on taskbehavior?

1 RQ2.1: Does search task effort vary across user groups?2 RQ2.2: Association between users’ interests and task

multiplicity?

3 RQ3: Characterizing various heterogeneities in searchbehavior.

Page 19: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Data Context

No of Queries 620M

No of Sessions 190M

No of Users 2M

Avg No of queries per session 3.18

Avg no of sessions per user 76.01

Avg no of tasks per session 2.08

Table: Data summary

Time Query SessionID TaskID Topic

05/29/2012 14:06:04 adele songs 1 1 Arts05/29/2012 14:11:49 wedding venue 1 2 Society05/29/2012 14:12:01 video download 1 3 Arts

05/29/2012 14:06:04 Obama care 2 4 News05/29/2012 14:11:49 running shoes 2 5 Shopping05/29/2012 14:12:01 sports shoes 2 5 Shopping05/29/2012 14:22:12 wedding cards 2 2 Society

Table: Sample search sessions

Page 20: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extent of Multitasking

Quantifying the extent of multi-tasking

0%  

20%  

40%  

60%  

80%  

100%  

%  se

ssions  

No  of  tasks  in  a  session  

EXTENT  OF  MULTITASKING  IN  SESSIONS  

2   3   4   5   6   7   8   9   10   10+   1  110629692  

0  

10  

20  

30  

40  

50  

60  

No  of  Users  (In  %)  

Avg  Number  of  Tasks  Performed  in  a  Session  1   2   3   4   5   6   7   8   9   10   10+  

Page 21: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

User Groups

Focussed  Users  

Mul--­‐taskers  

Supertaskers  

0%  

20%  

40%  

60%  

80%  

100%  

%  of  U

ser  P

opula-

on  

Types  of  User  Groups  

TASK  CENTRIC  USER  GROUPS  

Focussed   Mul--­‐Taskers   Supertaskers  

Figure: User groups based on multi-tasking behaviors

Page 22: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

User Groups: Example Sessions

User Type Example Queries from a Typical Session

Focussed User

”test guide.com”, ”CNA Practice Test”, ”CNA StateBoard Exam”, ”CNA Testing Schedule and Loca-tions”, ”CNA State Board Practice Test”, ”CNAPractice Test 2014”, CNA 50 Questions Test”, ”FreeGED Practice Test 2014”

Multi-Tasking User”Gravity FSX 2.0”, ”Full Suspension MountainBikes”, ”Walmart Cards”, ”Walmart Instant CardApplication”, ”Gravity FSX 2.0 price”, ”full suspen-sion bike sale”

Supertasking User

”hairstyles for women over 50”, ”thin wavy hairstylesfor women”, ”facebook”, ”fb sign in”, pulledpork crock pot recipe easy”, ”Slow-Roasted PulledPork”, ”barefoot contessa”, ”miley cyrus hair styles”,”hairstyler.com”

Table: Example query sessions from the different user groups.

Page 23: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Effort Metrics across User Groups

TTFC: Time to First Click TTLC: Time to Last ClickPCC: Page Click Count PgCC: Pagination Click Count

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

TTFC TTLC PCC PgCC

Focussed

Multi-taskers

Super-taskers

Stan

dar

dis

ed s

core

s

Figure: User groups based on multi-tasking behaviors

Page 24: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Topical Heterogeneity

0   0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   0.09  

Shopping  

Home  

Kids  

Health  

Extent  of  Mul,tasking  

(a) Focused

0.55   0.56   0.57   0.58   0.59   0.6   0.61   0.62   0.63   0.64   0.65   0.66  

News  

Sports  

Arts  

Kids  

Extent  of  Mul,tasking  

(b) Multitasker

0.88   0.9   0.92   0.94   0.96  

News  

Sports  

Arts  

Society  

Extent  of  Mul,tasking  

(c) Supertasker

0   0.1   0.2   0.3   0.4   0.5  

Arts  

Adult  

Games  

Computers  

Extent  of  Single-­‐Tasking  

(d) Focused

-­‐0.6   -­‐0.5   -­‐0.4   -­‐0.3   -­‐0.2   -­‐0.1   0  

Business  

Adult  

Computers  

Games  

Extent  of  Single-­‐Tasking  

(e) Multitasker

-­‐0.86   -­‐0.84   -­‐0.82   -­‐0.8   -­‐0.78   -­‐0.76   -­‐0.74  

Business  

Home  

Computers  

Games  

Extent  of  Single-­‐Tasking  

(f) Supertasker

Figure: Top topics prone to multi-tasking (Top) and single-tasking(Bottom) across different user groups.

Page 25: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Takeaways2

Sessions should NOT be the focal units in IR

More than 70% of sessions are multi-task

Users differ based on their multi-tasking habits

Search effort varies across user groups

Some topics prone to multi-tasking than others

2R. Mehrotra, P. Bhattacharya, E. Yilmaz; Characterizing UsersMulti-Tasking Behavior in Web Search, at ACM SIGIR ConferenceCHIIR 2016

Page 26: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Outline

Part 1 Understanding Search Sessions

Part 2 Extracting Tasks & Subtasks

Part 3 Hierarchies of Tasks & Subtasks

Page 27: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extracting Tasks & Subtasks

1 Task extraction

2 Subtask extraction

Page 28: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extracting Search Tasks

Various proposed strategies:

Clustering session based queries [Lucchese, WSDM’11]

Structured Learning Approach [Wang, WWW’13]

Page 29: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extracting Search Tasks

Various proposed strategies:

Hawkes Process based Task Extraction [Li, KDD’14]

Entity-based Task Extraction [Verma, CIKM’14][White,CIKM’14]

Page 30: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extracting Search Tasks

Problems abound:

Link query to on-going task = long chains

Impure tasks

Pairwise predictions need extra post-processing

Rely on large corpus of pre-tagged queries

Do not aggregate across users

Page 31: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Random Fields based Task Extraction3

Propose a latent variable model for extracting tasks:

Assumes k-underlying search tasks

Places a Markov Random Field over the latent task layer

Assumes access to a task-oracle

Each latent task is a multinomial distribution over queryterms

3R. Mehrotra, E. Yilmaz; Extracting Search Tasks via Task RandomField Model, SIGIR 2016 (under review)

Page 32: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Random Fields based Task Extraction

Figure: The proposed task-RF model for extracting search tasks.

The generative process:

Draw task proportions vector θ ∼ Dir(α)

Draw task labels z for all queries from the jointdistribution defined in Eq.(1)

For each query (qi ) in the session:

For each query term (wqi ,j)draw wqi ,j ∼ multi(βzi )

Page 33: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Random Fields based Task Extraction

A query marginal factorizes into its query terms:

p(q|z , β) =

Mq∏j=1

p(wj |z , β) (1)

The joint distribution of θ, z , q and w can be written as:

p(θ, z ,w |α, β, λ) = p(θ|α)p(z |θ, λ)N∏i=1

Mqi∏j=1

p(wqi ,j |zi , β) (2)

Under the MRF model, the joint probability of all taskassignments z = zi

Ni=1 can be written as:

p(z |θ, λ) =1

A(θ, λ)

N∏i=1

p(zi |θ) exp{λ∑

(m,n)∈E

1(zm = zn)} (3)

Page 34: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Random Fields based Task Extraction

Task Oracle: classifier which predicts whether or not 2 queriesbelong to the same task.

Query Based Featurescosine cosine similarity between the term sets of the queries

edit norm edit distance between query strings

Jac Jaccard coeff between the term sets of the queries

Term proportion of common terms between the queries

url-term-match-sum∑

u∈URLi m(qj , u) +∑

u∈URLj m(qi , u)

url-term-match-max maxu∈URLim(qj , u) + maxu∈URLjm(qi , u)

Embedding cosine distance between embedding vectors of the two queries

URL Based Featuresdom-match the number of domain matches between URLs from both queries

Min-edit-U Minimum edit distance between all URL pairs from the queries

Avg-edit-U Average edit distance between all URL pairs from the queries

Jac-U-min Minimum Jaccard coefficient between all URL pairs from the queries

Jac-U-avg Average Jaccard coefficient between all URL pairs from the queries

Table: Features used by the classifier.

Page 35: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Extracting Tasks & Subtasks

1 Task extraction

2 Subtask extraction

Page 36: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Subtask Extraction4

Complex task decompose to more focussed subtasks

Number of subtasks is unknown

Given: collection of on-task queries

Couple Bayesian Nonparametrics & Word Embeddings

4Subtask Extraction: R. Mehrotra, P. Bhattacharya, E. Yilmaz; ExploitingDistributional Representations with Distance Dependent CRPs for ExtractingSub-Tasks. Accepted at NAACL 2016

Page 37: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Chinese Restaurant Process

p(zi = j‖D, α) ∝{f (dij) if j 6= i

α if j = i

1 For each query i ∈ [1,N], draw assignmentzi ∼ dist − CRP(α, f ,D).

2 For each sub-task, k ∈ {1, ....}, draw a parameter θ∗k ∼ G0.

3 For each query i ∈ [1,N],1 If ci /∈ z∗q1:N

, set the parameter for the i th query to θi = θqi .Otherwise draw the parameter from the base distribution,θi ∼ Dirichlet(λ).

2 Draw the ith query, wi ∼ Mult(M, θi ).

Page 38: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Quantifying Task based Distances

Task: plan a weddingSample queries: wedding planning, wedding checklist, bridaldresses, wedding cards

Leverage word embeddings.

Classify each word as background word orsubtask-specific word.

Use a weighted combination of their embedding vectors toencode a query’s vector:

Vq =1

nterms

∑i

nqtiΣqnq

Vti (4)

Page 39: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Evaluating Quality of Subtasks

Qualitative Analysis:Proposed Approach LDA

sub-task 1 sub-task 2 sub-task 3 sub-task 1 sub-task 2 sub-task 3

wedding hairstyles used wedding dresses wedding card holders wedding insurance christian wedding vows make your own wedding invitations

wedding hair dos colorful bridal gowns indian wedding program cards destination wedding dresses brides wedding cakes pictures

curly wedding hairstyles preowned wedding dresses wedding program paper wedding planning book cheap wedding dresses planners

pictures of wedding hair wedding attire regency wedding cards party supply stores tea length wedding dresses wedding colors

CRP TW

sub-task 1 sub-task 2 sub-task 3 sub-task 1 sub-task 2 sub-task 3

wedding planning kit wedding theme wedding insurance wedding insurance christian wedding vows cheap dresses

destination wedding wedding guide weddings in vegas destination wedding plus size bridesmaid wedding cakes pictures

wedding table decorations save the date ideas wedding cakes pictures financing wedding rings wedding colors pricing weddings

1938 wedding pictures wedding vacation planning a wedding party supply stores tea length wedding dresses macys wedding dresses

User Study:

Page 40: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Takeaways

Latent variable model for task extraction based on Taskoracle

Extracts more coherent tasks

Nonparametric model for extracting subtasks

Couple nonparametric approaches with word embeddings

Human judgements & coherence estimates show improvedresults

Page 41: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Outline

Part 1 Understanding Search Sessions

Part 2 Extracting Tasks & Subtasks

Part 3 Hierarchies of Tasks & Subtasks

Page 42: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Hierarchies of Tasks & Subtasks

Complex search tasks places intense cognitive burden onthe userUsers need to explore domain-spaceFlat structure-less tasks lack insights about demarcationof subtasksProvides tremendous opportunity to contextually targetthe user (recommendations/advertisements)

Page 43: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Constructing Bayesian Hierarchies

Tree as partitions & mixtures:Tree: partition over the group of queries (Q) where probabilityof a query group is defined via Dynamic Programming as

p(Q|T ) = πT f (Q) + (1− πt)∏

Ti∈ch(T )

p(leaves(Ti )|Ti ) (5)

Gamma-Poisson Model of Query Affinities:

Query-Term Based Affinity (r1)cosine cosine similarity between the term sets of the queries

edit norm edit distance between query strings

Jac Jaccard coeff between the term sets of the queries

Term proportion of common terms between the queries

URL Based Affinity (r2)Min-edit-U Minimum edit distance between all URL pairs from the queries

Avg-edit-U Average edit distance between all URL pairs from the queries

Jac-U-min Minimum Jaccard coefficient between all URL pairs from the queries

Jac-U-avg Average Jaccard coefficient between all URL pairs from the queries

Session/User Based Affinity (r3)Same-U if the two queries belong to the same user

Same-S if the two queries belong to the same session

Embedding Based Affinity (r4)Embedding cosine distance between embedding vectors of the two queries

Table: Query-Query Affinities.

f (Q) =i=3∏i=1

p(Ri |αi , βi ) (6)

Page 44: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Constructing Bayesian Hierarchies

Forming Arbitrary Tree Structure:Bayesian Rose Trees (NIPS 2012)

Agglomerative Model Selection:

Bottom-up greedy agglomerative fashion

At each iteration: merges two of the trees in the forest

Which pair of tree to merge(& how) - merger that yieldsthe largest Bayes factor improvement

Page 45: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Takeaways5

We can extract recursive hierarchies of tasks & subtasks

Nonparametric tree partition model for hierarchies

Allows for arbitrary structures

Improves term prediction & task extraction

Human labelled judgements show:

we extract coherent tasksSubtasks are validSubtasks are useful for completing the overall task

5R. Mehrotra, E. Yilmaz; Towards Hierarchies of Search Tasks & Subtasks,In Proceedings of the Poster Track WWW 2015

Page 46: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

Thank You!

Page 47: Search Tasks, Proactive Search & Digital Assistants

ML Meetup

RishabhMehrotra

Motivation

Search Tasks

From Sessionsto Tasks

Multitasking

User Groups

TopicalHeterogeneity

ExtractingTasks &Subtasks

Task Extraction

SubtaskExtraction

Hierarchies ofTasks &Subtasks

References

R. Mehrotra, P. Bhattacharya, and E. Yilmaz.

Characterizing users’ multi-tasking behavior in web search.In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval,2016.

R. Mehrotra, P. Bhattacharya, and E. Yilmaz.

Deconstructing complex search tasks: A bayesian nonparametric approach for extracting sub-tasks.In Proceedings of NAACL-HLT, pages 599–605, 2016.

R. Mehrotra, P. Bhattacharya, and E. Yilmaz.

Sessions; tasks & topics - uncovering behavioral heterogeneities in online search behavior.In Proceedings of the 2016 39th International ACM SIGIR Conference on Research and Developmentin Information Retrieval. ACM, 2016.

R. Mehrotra and E. Yilmaz.

Towards hierarchies of search tasks & subtasks.In WWW 2015.

R. Mehrotra and E. Yilmaz.

Terms, topics & tasks: Enhanced user modelling for better personalization.In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pages131–140. ACM, 2015.

R. Mehrotra, E. Yilmaz, and M. Verma.

Task-based user modelling for personalization via probabilistic matrix factorization.In RecSys Posters, 2014.

D. Odijk, R. W. White, A. Hassan Awadallah, and S. T. Dumais.

Struggling and success in web search.In CIKM 2015.

White, Bennett, and Dumais.

Predicting short-term interests using activity-based search context.In CIKM 2010.

B. Xiang, D. Jiang, J. Pei, X. Sun, E. Chen, and H. Li.

Context-aware ranking in web search.In SIGIR 2010.

Eugene Agichtein, Ryen W White, Susan T Dumais, and Paul N Bennet.

2012.Search, interrupted: understanding and predicting search task continuation.In Proceedings of the 35th international ACM SIGIR conference on Research and development ininformation retrieval, pages 315–324. ACM.

David M Blei and Peter I Frazier.

2011.Distance dependent chinese restaurant processes.The Journal of Machine Learning Research, 12:2461–2488.

David M Blei, Andrew Y Ng, and Michael I Jordan.

2003.Latent dirichlet allocation.the Journal of machine Learning research, 3:993–1022.

Matthias Hagen, Jakob Gomoll, Anna Beyer, and Benno Stein.

2013.From search session detection to search mission detection.In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pages 85–92.LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE.

Rosie Jones and Kristina Lisa Klinkner.

2008.Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs.In Proceedings of the 17th ACM conference on Information and knowledge management, pages699–708. ACM.

Diane Kelly, Jaime Arguello, and Robert Capra.

2013.Nsf workshop on task-based information search systems.In ACM SIGIR Forum, volume 47, pages 116–127. ACM.

Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan.

2011.Modeling and analysis of cross-session search tasks.In Proceedings of the 34th international ACM SIGIR conference on Research and development inInformation Retrieval, pages 5–14. ACM.

M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger.

2015.From word embeddings to document distances.In ICML.

Zhen Liao, Yang Song, Li-wei He, and Yalou Huang.

2012.Evaluating the effectiveness of search task trails.In Proceedings of the 21st international conference on World Wide Web, pages 489–498. ACM.

Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Gabriele Tolomei.

2011.Identifying task-based sessions in search engine query logs.In Proceedings of the fourth ACM international conference on Web search and data mining, pages277–286. ACM.

Rishabh Mehrotra and Emine Yilmaz.

2015a.Terms, topics & tasks: Enhanced user modelling for better personalization.In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pages131–140. ACM.

Rishabh Mehrotra and Emine Yilmaz.

2015b.Towards hierarchies of search tasks & subtasks.In Proceedings of the 24th International Conference on World Wide Web Companion, pages 73–74.International World Wide Web Conferences Steering Committee.

Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie.

2013.Improving lda topic models for microblogs via tweet pooling and automatic labeling.In Proceedings of the 36th international ACM SIGIR conference on Research and development ininformation retrieval, pages 889–892. ACM.

Rishabh Mehrotra, Emine Yilmaz, and Manisha Verma.

2014.Task-based user modelling for personalization via probabilistic matrix factorization.In RecSys Posters.

Rishabh Mehrotra, Prasanta Bhattacharya, and Emine Yilmaz.

2016.Characterizing users’ multi-tasking behavior in web search.In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval,CHIIR ’16, pages 297–300, New York, NY, USA. ACM.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean.

2013.Distributed representations of words and phrases and their compositionality.In Advances in neural information processing systems, pages 3111–3119.

Greg Pass, Abdur Chowdhury, and Cayley Torgeson.

2006.A picture of search.In InfoScale, volume 152, page 1.

Amanda Spink, Sherry Koshman, Minsoo Park, Chris Field, and Bernard J Jansen.

2005.Multitasking web search on vivisimo. com.In Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on,volume 2, pages 486–490. IEEE.

Ming Sun, Yun-Nung Chen, and Alexander I Rudnicky.

2015.Understanding users cross-domain intentions in spoken dialog systems.In NIPS Workshop on Machine Learning for SLU and Interaction.

Manisha Verma and Emine Yilmaz.

2014.Entity oriented task extraction from query logs.In Proceedings of the 23rd ACM International Conference on Conference on Information andKnowledge Management, pages 1975–1978. ACM.

Chong Wang and David M Blei.

2009.Variational inference for the nested chinese restaurant process.In Advances in Neural Information Processing Systems, pages 1990–1998.

Hongning Wang, Yang Song, Ming-Wei Chang, Xiaodong He, Ryen W White, and Wei Chu.

2013.Learning to extract cross-session search tasks.In Proceedings of the 22nd international conference on World Wide Web, pages 1353–1364.International World Wide Web Conferences Steering Committee.