Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction...

Post on 27-Mar-2015

221 views 1 download

Tags:

Transcript of Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction...

Special Topics in Computer ScienceSpecial Topics in Computer Science

Advanced Topics in Information RetrievalAdvanced Topics in Information Retrieval

Chapter 1: IntroductionChapter 1: Introduction

Alexander Gelbukh

www.Gelbukh.com

2

MotivationMotivation

First for libraries, but now — WWW!!! Info: representation, storage, organization, access Search Engines (IR systems) User information need

o Plain English description query

Concerns of modern IR:o modeling

o classification, categorization, filtering

o system architecture

o user interfaces, visualization, query languages

3

Data vs. Information RetrievalData vs. Information Retrieval

Data Retrieval Precise description Well-structured data

Precise results Yes-or-no results

Science

Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking

Art!

4

Basic ConceptsBasic Concepts

User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)

Still not very well integrated

o Filtering (user passive, contents active) Logical view of docs

o ... Added linguistic info... not clear if helpso Full texto Text operations: reduce complexity to index terms

Keywords, stopwords Stemming, noun groups (linguistic processing needed)

o Categories

Slow, good

Fast, bad

5

Past, Present, and FuturePast, Present, and Future

Since clay tabletso Alphabetical index (formal)o Table of Contents (by storing order)o Classifications (by meaning)

Librarieso Automation of classical techniques. Catalogs.o Search by fields (exact match: author, title, keywords)

Web & Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data

Artificial Intelligence and Linguistic methods

6

Main concernsMain concerns

Open audienceo Help people to formulate their information need

o Improve retrieval quality. Intelligent methods

Efficiency (speed)o Development of fast techniques

Interactiono Watch user behavior to improve quality

o Privacy!

Open contento Legal issues. Copyright. Responsibility for info quality

o Intelligent methods

7

Retrieval processRetrieval process

Databaseo Define the logical view: text operations, text model

Index (e.g., inverted file)

User queryo Query operations (users are not good at this!)

Retrieved docso Ranked by likelihood (relevance)

Feedback cycle

9

The Textbook: Text IRThe Textbook: Text IR

Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation

Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations

Efficiencyo Indexing and Searching

10

Conferences & JournalsConferences & Journals

Confs on IRo IRo ACM SIGIRo TRECo SPIRE

Journalo IR

General conferences on text processingo ACLo COLINGo CICLingo DEXA (databases)o NLDB

11

ConclusionsConclusions

User Information Needo Vague

o Semantic, not formal

Document Relevanceo Order, not retrieve

Huge amount of informationo Efficiency concerns

o Tradeoffs

IR is art more than science

12

Thank you!