Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Information Retrieval Evaluation and the Retrieval Process.
-
Upload
ferdinand-page -
Category
Documents
-
view
237 -
download
0
Transcript of Information Retrieval Evaluation and the Retrieval Process.
Information Retrieval
Evaluation and the Retrieval Process
Why evaluate an IR system?
To select between alternative systems To determine if a system meets expressed
and unexpressed needs of current users and non-users
To improve IR systems and determine if improvement actually occurred
To develop cost models
4 levels of evaluation - Lancaster
Effectiveness Benefits Cost effectiveness Cost benefits
Effectiveness
What a system does well, e.g., percentage of reference questions answered accurately, the recall and precision of a literature search
There are a number of measures of effectiveness
Measuring Effectiveness
Relevant Not Relevant
Retrieved a. Hits b. Noise or fallout
NotRetrieved
c. Misses d. Correctly rejected
Measures of Effectiveness
Recall Precision Relevance Pertinence or utility
Novelty ratio Fallout and noise Timeliness Coverage Generality
Recall and Precision Ratios
Recall a/(a+c): proportion of relevant items retrieved out of the total number of relevant items contained in a database
Precision a/(a+b): a signal-to-noise ratio--proportion of retrieved materials that are relevant to a query
Used together, the 2 ratios express the filtering capacity of the system
Recall and precision tend to be inversely related
Relevance and Pertinence
Relevance (or generality ratio) (a+c)/(a+b+c+d): the number or proportion of materials in a system that are relevant to a query. Can be hard to ascertain without scanning the entire database.
Pertinence: the relationship between a document and an information need. Utility refers to the subset of a that is actually used.
Novelty Ratio, Fallout and Noise
Novelty ratio: a subset of a that is actually new to the person evaluating relevance
Fallout and Noise: the subset b of retrieved items that are not relevant
Timeliness, Coverage, and Generality
Timeliness and coverage: factors that affect assessments of relevance and pertinence
Generality: the number of documents related to a particular request in the entire database. The more dense the ratio, the easier a search should be
Accuracy
Criteria Commonly Used to Evaluate Retrieval Performance Recall Precision User effort
Amount of time a user spends conducting a search Amount of time a user spends negotiating his inquiry and
then separating relevant from irrelevant items Response time Benefits Search costs Cost effectiveness Cost benefits
Objective vs. Subjective Knowledge
Factual or artifactual knowledge vs. how knowledge is constructed or modeled within an individual’s mind
Subjective knowledge (and therefore relevance judgments) varies from person to person, e.g., individual aesthetic judgments or problem solving methods
Benefits
What good a system does, e.g., how an information system benefits its users
Hard to measure
Search Costs
Economics of using different databases Using natural language indexing can shift
effort onto the searcher
Cost Effectiveness
Relationship of cost criteria to quality criteria, e.g., unit cost per relevant or new item retrieved
Cost Benefits
Cost savings through use of one information system over another
Increased, or avoidance of loss of, productivity
Improved decision-making or reduction of personnel needed to make decisions
Avoidance of duplication of effort
Components of an Evaluation
1. Defining the scope of the evaluation- Formative vs. summative
2. Designing the evaluation program
3. Execution of the evaluation
4. Analysis and interpretation of the results
5. Modifying the system based on the results
6. Iteration if necessary (go back to step 3)
Real Life vs. Experimental Systems Experiments and benchmark tests -
standardized collections, queries, and relevance judgments tested against multiple systems evaluated on recall and precision biases often built into system design
Predictive evaluation - expert reviews usage simulation such as walthroughs
Real life - observing users’ interactions with system eliciting users’ opinions
Classic IR Model - Bates
Document --> Document representation matched up with
Query <-- Information need
Problems with Classic IR Model
Users cannot use their own language
Different users have different needs
Users have different information needs at different times
Users are not always able to read and write
Information need may evolve during the search process
Some users are not concerned about precision and recall
Users may want to eliminate known items
Users may want more cues to assist in assessing relevance
Other factors influencing use
Accessibility - physical, intellectual, and psychological - and ease of use are the most important determinants of whether an information service is used
Principle of Least Effort Perceived technical quality also affects the choice
of first source Perceptions of accessibility ar einfluenced by
experience
Berrypicking
Search queries are not static, but evolve Searchers gather information in bits and
pieces Searchers use a variety of search techniques Searchers use a variety of other sources as
well as databases
Search Strategies
Footnote chasing Citation searching Journal run Area scanning
Subject searches in bibliographies, abstracts, and indexes
Author searching
Making Retrieval More Effective
The more techniques used, the more effective a search is likely to be
Users should be able to search in ways that are already familiar or that they have found to be effective
A visual representation of the contents of a system may aid users in orienting themselves