Exploratory Search Missions for TREC Topicsceur-ws.org/Vol-1033/extras/paper3-poster.pdfI 150 essays...
Transcript of Exploratory Search Missions for TREC Topicsceur-ws.org/Vol-1033/extras/paper3-poster.pdfI 150 essays...
Exploratory Search Missions for TREC TopicsMartin Potthast Matthias Hagen Michael Volske Benno Stein
Bauhaus-Universitat Weimar99421 Weimar, Germany
ClueWeb API
Revision log ClueWebQuery log
Editor
Topic
ChatNoir SE
Author
We report on the construction of a new text reuse corpuscomprising writing interactions and exploratory search missions.
I 150 essays (based on TREC Web Track topics 2009-2011)
I 12 professional writers hired on a crowdsourcing platform
I Long essay writing task, researching sources using a customClueWeb09 search engine
I Writing and search engine interactions recorded in high detail
Corpus Overview
AuthorsWriter Demographics
Age Gender Native language(s)Minimum 24 Female 67% English 67%Median 37 Male 33% Filipino 25%Maximum 65 Hindi 17%Academic degree Country of origin Second language(s)Postgraduate 41% UK 25% English 33%Undergraduate 25% Philippines 25% French 17%None 17% USA 17% Afrikaans, Dutch,n/a 17% India 17% German, Spanish,
Australia 8% Swedish each 8%South Africa 8% None 8%
Years of writing Search engines used Search frequencyMinimum 2 Google 92% Daily 83%Median 8 Bing 33% Weekly 8%Standard dev. 6 Yahoo 25% n/a 8%Maximum 20 Others 8%
TopicsExample topic:Obama’s family.Write about President Barack Obama’s family history, includinggenealogy, national origins, places and dates of birth, etc. Where didBarack Obama’s parents and grandparents come from? Also include abrief biography of Obama’s mother.
Original topic 001 of the TREC Web Track 2009:Query. obama family treeDescription. Find information on President Barack Obama’s familyhistory, including genealogy, national origins, places and dates of birth,etc.Sub-topic 1. Find the TIME magazine photo essay “Barack Obama’sFamily Tree.”Sub-topic 2. Where did Barack Obama’s parents and grandparentscome from?Sub-topic 3. Find biographical information on Barack Obama’s mother.
Query logCorpus Distribution ΣCharacteristic min avg max stdevWriters 12Topics 150Topics / Writer 1 12.5 33 9.3Queries 13 651Queries / Topic 4 91.0 616 83.1Clicks 16 739Clicks / Topic 12 111.6 443 80.3Clicks / Query 0 0.8 76 2.2Sessions 931Sessions / Topic 1 12.3 149 18.9Days 201Days / Topic 1 4.9 17 2.7Hours 2068Hours / Writer 3 129.3 679 167.3Hours / Topic 3 7.5 10 2.5
Search mission data will be made available as the Webis-Query-Log-12 (http://www.webis.de/research/corpora)
Data Collection
047165
11233
02320
02416
044158
037210
05258
06670
064113
14223
00318
028119
14028
09027
053196
13623
080347
017109
027248
08540
018148
013153
048113
082154
110319
09564
06926
00930
15018
010208
12335
07224
02634
088114
022284
08446
10252
00460
01252
09848
02966
07597
13450
107138
04036
07942
14834
01570
01434
05657
099120
049616
12674
145101
06262
11132
11869
149106
1304
131136
03928
005108
11498
14347
08946
02110
12155
00750
13988
04548
087198
08694
031218
12048
058198
081112
03076
06120
019147
001170
096139
09156
108106
008323
01670
14660
10974
093104
03851
09442
133301
054111
08369
03444
065150
144274
04148
10592
060155
12799
138241
10658
09784
051181
01140
002135
03546
059118
067185
11514
11629
025133
07061
07317
12423
05078
12924
06366
05580
07833
11768
10412
141162
12560
00676
07162
128108
10322
06842
07642
13575
11369
04618
119147
042208
02030
14724
122173
13716
13216
03252
07726
05736
0749
03660
1018
04330
03342
09274
10064
A B C D E F G H I J
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Spectrum of search behaviorI Percentage of queries submitted over time for all 150 search missionsI Ranges from majority of queries issued at the start of the task (A1) to most queries
towards the end (J15)
I In between, sets of queries submitted in bursts (e.g F9) or linear increase (A10)
Correlation of searching and writingI Evidence of distinct text reuse strategies
(build-up and boil-down)I Only the former clearly reflected in the
query log0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Author 5 (18 topics)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Author 24 (13 topics)
Average query distribution
Average text length over time
First ConclusionsI Query frequency by itself poor predictor of task completionI Heavy reliance on search engine indicates need to better support exploratory tasks
Main Findings
Web Technology and Information Systems Bauhaus-Universitat Weimarwww.webis.de