Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.
-
Upload
ambrose-washington -
Category
Documents
-
view
213 -
download
1
Transcript of Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.
Text REtrieval Conference (TREC)
The TREC Conferenceshttp://trec.nist.gov
Ellen Voorhees
Text REtrieval Conference (TREC)
TREC Philosophy
• TREC is a modern example of the Cranfield tradition– system evaluation based on test
collections
• Emphasis on advancing the state of the art from evaluation results– TREC’s primary purpose is not competitive
benchmarking– experimental workshop: sometimes
experiments fail!
Text REtrieval Conference (TREC)
Cranfield at Fifty• Evaluation methodology is still
valuable…• carefully calibrated level of abstraction
– has sufficient fidelity to real user tasks to be informative
– general enough to be broadly applicable, feasible, relatively inexpensive
• …but is showing some signs of age• size is overwhelming our ability to evaluate• new abstractions need to carefully
accommodate variability to maintain power
Text REtrieval Conference (TREC)
Evaluation Difficulties • Variability
• despite stark abstraction, user effect still dominates Cranfield results
• Size matters• effective pooling has corpus size dependency• test collection construction costs depend on
number of judgments
• Model coarseness• even slightly different tasks may not be good fit
– e.g., legal discovery, video features
Text REtrieval Conference (TREC)
TREC 2009• All tracks used some new, large
document set
• Different trade-offs in adapting evaluation strategy
• tension between evaluating current participants’ ability to do the task and building reusable test collections
• variety of tasks that are not simple ranked-list retrieval
Text REtrieval Conference (TREC)
ClueWeb09 Document Set• Snapshot of the WWW in early 2009
• crawled by CMU with support from NSF• distributed through CMU• used in four TREC 2009 tracks: Web,
Relevance Feedback, Million Query, and Entity
• Full corpus• about one billion pages and 25 terabytes of
text• about half is in English
• Category B• English-only subset of about 50 million pages
(including Wikipedia) to permit wider participation
Text REtrieval Conference (TREC)
TREC 2009 ParticipantsApplied Discovery Logik Systems, Inc. University of Applied Science Geneva
Beijing Institute of Technology Microsoft Research Asia University of Arkansas, Little Rock
Beijing U. of Posts and Telecommunications
Microsoft Research Cambridge University of California, Santa Cruz
Cairo Microsoft Innovation Center Milwaukee School of Engineering University of Delaware (2)
Carnegie Mellon University Mugla University University of Glasgow
Chinese Academy of Sciences (2) National Institute of Information and Communications Technology
University of Illinois, Urbana-Champaign
Clearwell Systems, Inc. Northeastern University University of Iowa
Clearly Gottlieb Steen & Hamilton, with Backstop LLC
Open Text Corporation University of Lugano
Dalian University of Technology Peking University University of Maryland, College Park
Delft University of Technology Pohang U. of Science & Technology University of Massachusetts, Amherst
EMC - CMA - R&D Purdue University The University of Melbourne
Equivio Queensland University of Technology University of Padova
Fondazione Ugo Bordoni RMIT University University of Paris
Fraunhofer SCAI Sabir Research University of Pittsburgh
Fudan University South China University of Technology University of Twente
H5 SUNY Buffalo University of Waterloo (2)
Heilongjiang Inst. of Technology Tsinghua University Ursinus College
Integreon Universidade do Porto Yahoo! Research
International Inst. of Information Technology, Hyderabad
University College Dublin York University (2)
Know-Center University of Alaska, Fairbanks ZL Technologies, Inc.
Lehigh University University of Amsterdam (2)
Text REtrieval Conference (TREC)
The TREC TracksBlogSpam
Chemical IRGenomics
NoveltyQA, Entity
LegalEnterprise
Terabyte, Million QueryWebVLC
VideoSpeechOCR
Cross-languageChineseSpanish
Interactive, HARD, Feedback
FilteringRouting
Ad Hoc, Robust
Personaldocuments
Retrieval in adomain
Answers,not documents
Searching corporaterepositories
Size,efficiency, &web search
Beyond text
Beyond just English
Human-in-the-loop
Streamedtext
Static text
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
Text REtrieval Conference (TREC)
TREC 2010
• Blog, Chemical IR, Entity, Legal, Relevance Feedback, Web continuing
• Million Query merged with Web• New “Sessions” track: investigate
search behavior over a series of queries (series of length 2 for first running in 2010)
Text REtrieval Conference (TREC)
TREC 2011
• Track proposals due Monday (Sept 27)
• New track on searching free text fields of medical records likely