Exploratory Search Missions for TREC Topicsceur-ws.org/Vol-1033/extras/paper3-poster.pdfI 150 essays...

Post on 05-Jun-2020

2 views 0 download

Transcript of Exploratory Search Missions for TREC Topicsceur-ws.org/Vol-1033/extras/paper3-poster.pdfI 150 essays...

Exploratory Search Missions for TREC TopicsMartin Potthast Matthias Hagen Michael Volske Benno Stein

Bauhaus-Universitat Weimar99421 Weimar, Germany

ClueWeb API

Revision log ClueWebQuery log

Editor

Topic

ChatNoir SE

Author

We report on the construction of a new text reuse corpuscomprising writing interactions and exploratory search missions.

I 150 essays (based on TREC Web Track topics 2009-2011)

I 12 professional writers hired on a crowdsourcing platform

I Long essay writing task, researching sources using a customClueWeb09 search engine

I Writing and search engine interactions recorded in high detail

Corpus Overview

AuthorsWriter Demographics

Age Gender Native language(s)Minimum 24 Female 67% English 67%Median 37 Male 33% Filipino 25%Maximum 65 Hindi 17%Academic degree Country of origin Second language(s)Postgraduate 41% UK 25% English 33%Undergraduate 25% Philippines 25% French 17%None 17% USA 17% Afrikaans, Dutch,n/a 17% India 17% German, Spanish,

Australia 8% Swedish each 8%South Africa 8% None 8%

Years of writing Search engines used Search frequencyMinimum 2 Google 92% Daily 83%Median 8 Bing 33% Weekly 8%Standard dev. 6 Yahoo 25% n/a 8%Maximum 20 Others 8%

TopicsExample topic:Obama’s family.Write about President Barack Obama’s family history, includinggenealogy, national origins, places and dates of birth, etc. Where didBarack Obama’s parents and grandparents come from? Also include abrief biography of Obama’s mother.

Original topic 001 of the TREC Web Track 2009:Query. obama family treeDescription. Find information on President Barack Obama’s familyhistory, including genealogy, national origins, places and dates of birth,etc.Sub-topic 1. Find the TIME magazine photo essay “Barack Obama’sFamily Tree.”Sub-topic 2. Where did Barack Obama’s parents and grandparentscome from?Sub-topic 3. Find biographical information on Barack Obama’s mother.

Query logCorpus Distribution ΣCharacteristic min avg max stdevWriters 12Topics 150Topics / Writer 1 12.5 33 9.3Queries 13 651Queries / Topic 4 91.0 616 83.1Clicks 16 739Clicks / Topic 12 111.6 443 80.3Clicks / Query 0 0.8 76 2.2Sessions 931Sessions / Topic 1 12.3 149 18.9Days 201Days / Topic 1 4.9 17 2.7Hours 2068Hours / Writer 3 129.3 679 167.3Hours / Topic 3 7.5 10 2.5

Search mission data will be made available as the Webis-Query-Log-12 (http://www.webis.de/research/corpora)

Data Collection

047165

11233

02320

02416

044158

037210

05258

06670

064113

14223

00318

028119

14028

09027

053196

13623

080347

017109

027248

08540

018148

013153

048113

082154

110319

09564

06926

00930

15018

010208

12335

07224

02634

088114

022284

08446

10252

00460

01252

09848

02966

07597

13450

107138

04036

07942

14834

01570

01434

05657

099120

049616

12674

145101

06262

11132

11869

149106

1304

131136

03928

005108

11498

14347

08946

02110

12155

00750

13988

04548

087198

08694

031218

12048

058198

081112

03076

06120

019147

001170

096139

09156

108106

008323

01670

14660

10974

093104

03851

09442

133301

054111

08369

03444

065150

144274

04148

10592

060155

12799

138241

10658

09784

051181

01140

002135

03546

059118

067185

11514

11629

025133

07061

07317

12423

05078

12924

06366

05580

07833

11768

10412

141162

12560

00676

07162

128108

10322

06842

07642

13575

11369

04618

119147

042208

02030

14724

122173

13716

13216

03252

07726

05736

0749

03660

1018

04330

03342

09274

10064

A B C D E F G H I J

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Spectrum of search behaviorI Percentage of queries submitted over time for all 150 search missionsI Ranges from majority of queries issued at the start of the task (A1) to most queries

towards the end (J15)

I In between, sets of queries submitted in bursts (e.g F9) or linear increase (A10)

Correlation of searching and writingI Evidence of distinct text reuse strategies

(build-up and boil-down)I Only the former clearly reflected in the

query log0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Author 5 (18 topics)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Author 24 (13 topics)

Average query distribution

Average text length over time

First ConclusionsI Query frequency by itself poor predictor of task completionI Heavy reliance on search engine indicates need to better support exploratory tasks

Main Findings

Web Technology and Information Systems Bauhaus-Universitat Weimarwww.webis.de