Structure of IR Systems INST 734 Module 1 Doug Oard.

11
Structure of IR Systems INST 734 Module 1 Doug Oard

Transcript of Structure of IR Systems INST 734 Module 1 Doug Oard.

Page 1: Structure of IR Systems INST 734 Module 1 Doug Oard.

Structure of IR Systems

INST 734

Module 1

Doug Oard

Page 2: Structure of IR Systems INST 734 Module 1 Doug Oard.

Segments

1. The nature of Information Retrieval (IR)

2. What IR systems do

3. The structure of interactive IR systems

Page 3: Structure of IR Systems INST 734 Module 1 Doug Oard.

Systems: The Memex

Page 4: Structure of IR Systems INST 734 Module 1 Doug Oard.

Types of Information Needs

• Retrospective (“Retrieval”)– “Searching the past”– Different queries posed against a static collection

• Prospective (“Recommendation”)– “Searching the future”– Static query posed against a dynamic collection

Page 5: Structure of IR Systems INST 734 Module 1 Doug Oard.

Two Ways of Searching

Write the documentusing terms to

convey meaning

Author

Content-BasedQuery-Document

Matching Document Terms

Query Terms

Construct query fromterms that may

appear in documents

Free-TextSearcher

Retrieval Status Value

Construct query fromavailable concept

descriptors

ControlledVocabulary

Searcher

Choose appropriate concept descriptors

Indexer

Metadata-BasedQuery-Document

Matching Query Descriptors

Document Descriptors

Page 6: Structure of IR Systems INST 734 Module 1 Doug Oard.

The IR Black BoxDocumentsQuery

Hits

Page 7: Structure of IR Systems INST 734 Module 1 Doug Oard.

Inside the IR Black Box

DocumentsQuery

Hits

RepresentationFunction

RepresentationFunction

Query Representation Document Representation

ComparisonFunction Index

Page 8: Structure of IR Systems INST 734 Module 1 Doug Oard.

Comparison Function

Representation Function

Query Formulation

Human Judgment

Representation Function

Retrieval Status Value

Utility

Query

Information Need Document

Query Representation Document Representation

Que

ry P

roce

ssin

g

Doc

umen

t P

roce

ssin

g

Page 9: Structure of IR Systems INST 734 Module 1 Doug Oard.

Relevance

• Relevance relates a topic and a document– Duplicates are equally relevant, by definition– Constant over time and across users

• Pertinence relates a task and a document– Accounts for quality, complexity, language, …

• Utility relates a user and a document– Accounts for prior knowledge

Page 10: Structure of IR Systems INST 734 Module 1 Doug Oard.

Comparing Databases and IR

Nature of the content

Interaction with system

Results we get

Queries we’re posing

What we’re retrieving

IRDatabases

Updates can often be processed offline.

Able to handle real-time updates.

Interaction sequence can help resolve vagueness.

Single query produces a complete answer.

Sometimes relevant, often not.

Exact. Always correct in a formal sense.

Vague, imprecise queries (and information needs)

Unambiguous formally (mathematically) defined queries.

Mostly unstructured. Free text with some metadata.

Structured data. Clear semantics based on a formal model.

Page 11: Structure of IR Systems INST 734 Module 1 Doug Oard.

Segments

1. The nature of Information Retrieval (IR)

2. What IR systems do

3. The structure of interactive IR systems