Post on 27-Mar-2015
Special Topics in Computer ScienceSpecial Topics in Computer Science
Advanced Topics in Information RetrievalAdvanced Topics in Information Retrieval
Chapter 1: IntroductionChapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
2
MotivationMotivation
First for libraries, but now — WWW!!! Info: representation, storage, organization, access Search Engines (IR systems) User information need
o Plain English description query
Concerns of modern IR:o modeling
o classification, categorization, filtering
o system architecture
o user interfaces, visualization, query languages
3
Data vs. Information RetrievalData vs. Information Retrieval
Data Retrieval Precise description Well-structured data
Precise results Yes-or-no results
Science
Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking
Art!
4
Basic ConceptsBasic Concepts
User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)
Still not very well integrated
o Filtering (user passive, contents active) Logical view of docs
o ... Added linguistic info... not clear if helpso Full texto Text operations: reduce complexity to index terms
Keywords, stopwords Stemming, noun groups (linguistic processing needed)
o Categories
Slow, good
Fast, bad
5
Past, Present, and FuturePast, Present, and Future
Since clay tabletso Alphabetical index (formal)o Table of Contents (by storing order)o Classifications (by meaning)
Librarieso Automation of classical techniques. Catalogs.o Search by fields (exact match: author, title, keywords)
Web & Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data
Artificial Intelligence and Linguistic methods
6
Main concernsMain concerns
Open audienceo Help people to formulate their information need
o Improve retrieval quality. Intelligent methods
Efficiency (speed)o Development of fast techniques
Interactiono Watch user behavior to improve quality
o Privacy!
Open contento Legal issues. Copyright. Responsibility for info quality
o Intelligent methods
7
Retrieval processRetrieval process
Databaseo Define the logical view: text operations, text model
Index (e.g., inverted file)
User queryo Query operations (users are not good at this!)
Retrieved docso Ranked by likelihood (relevance)
Feedback cycle
9
The Textbook: Text IRThe Textbook: Text IR
Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation
Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations
Efficiencyo Indexing and Searching
10
Conferences & JournalsConferences & Journals
Confs on IRo IRo ACM SIGIRo TRECo SPIRE
Journalo IR
General conferences on text processingo ACLo COLINGo CICLingo DEXA (databases)o NLDB
11
ConclusionsConclusions
User Information Needo Vague
o Semantic, not formal
Document Relevanceo Order, not retrieve
Huge amount of informationo Efficiency concerns
o Tradeoffs
IR is art more than science
12
Thank you!