Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.

10
Text REtrieval Conference (TREC) The TREC Conferences http://trec.nist.gov Ellen Voorhees

Transcript of Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.

Page 1: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

The TREC Conferenceshttp://trec.nist.gov

Ellen Voorhees

Page 2: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

TREC Philosophy

• TREC is a modern example of the Cranfield tradition– system evaluation based on test

collections

• Emphasis on advancing the state of the art from evaluation results– TREC’s primary purpose is not competitive

benchmarking– experimental workshop: sometimes

experiments fail!

Page 3: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

Cranfield at Fifty• Evaluation methodology is still

valuable…• carefully calibrated level of abstraction

– has sufficient fidelity to real user tasks to be informative

– general enough to be broadly applicable, feasible, relatively inexpensive

• …but is showing some signs of age• size is overwhelming our ability to evaluate• new abstractions need to carefully

accommodate variability to maintain power

Page 4: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

Evaluation Difficulties • Variability

• despite stark abstraction, user effect still dominates Cranfield results

• Size matters• effective pooling has corpus size dependency• test collection construction costs depend on

number of judgments

• Model coarseness• even slightly different tasks may not be good fit

– e.g., legal discovery, video features

Page 5: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

TREC 2009• All tracks used some new, large

document set

• Different trade-offs in adapting evaluation strategy

• tension between evaluating current participants’ ability to do the task and building reusable test collections

• variety of tasks that are not simple ranked-list retrieval

Page 6: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

ClueWeb09 Document Set• Snapshot of the WWW in early 2009

• crawled by CMU with support from NSF• distributed through CMU• used in four TREC 2009 tracks: Web,

Relevance Feedback, Million Query, and Entity

• Full corpus• about one billion pages and 25 terabytes of

text• about half is in English

• Category B• English-only subset of about 50 million pages

(including Wikipedia) to permit wider participation

Page 7: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

TREC 2009 ParticipantsApplied Discovery Logik Systems, Inc. University of Applied Science Geneva

Beijing Institute of Technology Microsoft Research Asia University of Arkansas, Little Rock

Beijing U. of Posts and Telecommunications

Microsoft Research Cambridge University of California, Santa Cruz

Cairo Microsoft Innovation Center Milwaukee School of Engineering University of Delaware (2)

Carnegie Mellon University Mugla University University of Glasgow

Chinese Academy of Sciences (2) National Institute of Information and Communications Technology

University of Illinois, Urbana-Champaign

Clearwell Systems, Inc. Northeastern University University of Iowa

Clearly Gottlieb Steen & Hamilton, with Backstop LLC

Open Text Corporation University of Lugano

Dalian University of Technology Peking University University of Maryland, College Park

Delft University of Technology Pohang U. of Science & Technology University of Massachusetts, Amherst

EMC - CMA - R&D Purdue University The University of Melbourne

Equivio Queensland University of Technology University of Padova

Fondazione Ugo Bordoni RMIT University University of Paris

Fraunhofer SCAI Sabir Research University of Pittsburgh

Fudan University South China University of Technology University of Twente

H5 SUNY Buffalo University of Waterloo (2)

Heilongjiang Inst. of Technology Tsinghua University Ursinus College

Integreon Universidade do Porto Yahoo! Research

International Inst. of Information Technology, Hyderabad

University College Dublin York University (2)

Know-Center University of Alaska, Fairbanks ZL Technologies, Inc.

Lehigh University University of Amsterdam (2)

Page 8: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

The TREC TracksBlogSpam

Chemical IRGenomics

NoveltyQA, Entity

LegalEnterprise

Terabyte, Million QueryWebVLC

VideoSpeechOCR

Cross-languageChineseSpanish

Interactive, HARD, Feedback

FilteringRouting

Ad Hoc, Robust

Personaldocuments

Retrieval in adomain

Answers,not documents

Searching corporaterepositories

Size,efficiency, &web search

Beyond text

Beyond just English

Human-in-the-loop

Streamedtext

Static text

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

Page 9: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

TREC 2010

• Blog, Chemical IR, Entity, Legal, Relevance Feedback, Web continuing

• Million Query merged with Web• New “Sessions” track: investigate

search behavior over a series of queries (series of length 2 for first running in 2010)

Page 10: Text REtrieval Conference (TREC) The TREC Conferences  Ellen Voorhees.

Text REtrieval Conference (TREC)

TREC 2011

• Track proposals due Monday (Sept 27)

• New track on searching free text fields of medical records likely