Slides
-
Upload
galena-faulkner -
Category
Documents
-
view
36 -
download
1
description
Transcript of Slides
![Page 1: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/1.jpg)
Slides
• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt
www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
![Page 2: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/2.jpg)
Interactions
LBSC 796/CMSC 838o
Daqing He, Douglas W. Oard
Session 5, March 8, 2004
![Page 3: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/3.jpg)
Slides
• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt
www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
![Page 4: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/4.jpg)
Agenda
• Interactions in retrieval systems
• Query formulation
• Selection
• Examination
• Document delivery
![Page 5: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/5.jpg)
System Oriented Retrieval Model
Search
Query
Ranked List
Indexing Index
Acquisition Collection
![Page 6: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/6.jpg)
Whose Process Is It?
• Who initiates a search process?
• Who controls the progress?
• Who ends a search process?
![Page 7: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/7.jpg)
User Oriented Retrieval Model
SourceSelection
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
DocumentDelivery
Document
QueryFormulation
IR System
CollectionIndexing
Index
CollectionAcquisition
Collection
User
![Page 8: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/8.jpg)
Taylor’s Conceptual Framework
• Four levels of “information needs”– Visceral
• What you really want to know
– Conscious• What you recognize that you want to know
– Formalized (e.g., TREC topics)• How you articulate what you want to know
– Compromised (e.g., TREC queries)• How you express what you want to know to a system
[Taylor 68]
![Page 9: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/9.jpg)
Belkin’s ASK model
• Users are concerned with a problem
• But do not clearly understand– the problem itself– the information need to solve the problem
Anomalous State of Knowledge
• Need clarification process to form a query
[Belkin 80, Belkin, Oddy, Brooks 82]
![Page 10: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/10.jpg)
What are humans good at?
• Sense low level stimuli
• Recognize patterns
• Reason inductively
• Communicate with multiple channels
• Apply multiple strategies
• Adapt to changes or unexpected events
From Ben Shneiderman’s “designing user interfaces”
![Page 11: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/11.jpg)
What are computers good at?
• Sense stimuli outside human’s range
• Calculate fast and mechanical
• Store large quantities and recall accurately
• Response rapidly and consistently
• Perform repetitive actions reliably
• Maintain performance under heavy load and extended time
From Ben Shneiderman’s “designing user interfaces”
![Page 12: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/12.jpg)
What should Interaction be?
• Synergic
• Humans do things that human are good at
• Computers do things that computers are good at
• the strength of one covers the weakness of the other
![Page 13: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/13.jpg)
Source Selection
• People have their own preference
• Different tasks require different sources
• Possible choices– ask help from people or machines– browsing or search, or combination– general purpose vs specific domain IR system– different collections
![Page 14: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/14.jpg)
Query Formulation
Search
QueryQuery
Formulation
CollectionIndexing
Index
User
![Page 15: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/15.jpg)
User’s Goals
• User’s goals– Identify the right query for the current need
• conscious/formalized need => compromised need
• How can the user achieve this goal?– Infer the right query terms– Infer the right composition of terms
![Page 16: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/16.jpg)
System’s Goals
• Help the user – build links between needs – know more about the system and the collection
![Page 17: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/17.jpg)
How does System Achieve Its Goals?
• Ask more from the user – Encourage long/complex queries
• Provide a large text entry area• Use forms filling or direct manipulation
– Initiate interactions• Ask questions related to the needs • Engage a dialogue with the user
• Infer from relevant items– Infer from previous queries– Infer from previous retrieved documents
![Page 18: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/18.jpg)
Query Formulation Interaction Styles
• Shneiderman 97– Command Language– Form Fillin– Menu Selection– Direct Manipulation– Natural Language
Credit: Marti Hearst
![Page 19: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/19.jpg)
Form-Based Query Specification (Melvyl)
Credit: Marti Hearst
![Page 20: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/20.jpg)
Form-based Query Specification (Infoseek)
Credit: Marti Hearst
![Page 21: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/21.jpg)
Dir
ect M
anip
ulat
ion
Spe
c.V
QU
ER
Y (
Jone
s 98
)
Credit: Marti Hearst
![Page 22: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/22.jpg)
High-Accuracy Retrieval of Documents
SearchEngine
Baseline Results
TopicStatement
Clarification Questions
Answers toClarification Questions
HARD Results
![Page 23: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/23.jpg)
UMD HARD 2003 retrieval model
Preference among subtopic areas
Recently viewed relevant documents
Preference to sub-collections or genres
Query Expansion
Refined Ranked
List
Desired result formats
Clarification Questions
DocumentReranking
PassageRetrieval
Ranked ListMerging
HARD retrieval process
[He & Demner, 2003]
![Page 24: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/24.jpg)
Dialogues in Need Negotiation
InformationNeed
1. Formulate a Query
Document Collection
3. Find DocumentsMatching the Query
Search Results
2. Need negotiation
SearchEngine
![Page 25: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/25.jpg)
Personalization through User’s Search Contexts
AfricanQueen
Romantic Films
Romantic Films
IncrementalLearner
Casablanca
InformationRetrievalSystem
[Goker & He, 2000]
![Page 26: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/26.jpg)
Things That Hurt
• Obscure ranking methods– Unpredictable effects of adding or deleting terms
• Only single-term queries avoid this problem
• Counterintuitive statistics– “clis”: AltaVista says 3,882 docs match the
query– “clis library”: 27,025 docs match the
query!• Every document with either term was counted
![Page 27: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/27.jpg)
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
![Page 28: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/28.jpg)
Indicative vs. Informative
• Terms often applied to document abstracts– Indicative abstracts support selection
• They describe the contents of a document
– Informative abstracts support understanding• They summarize the contents of a document
• Applies to any information presentation– Presented for indicative or informative purposes
![Page 29: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/29.jpg)
User’s Browsing Goals
• Identify documents for some form of delivery– An indicative purpose
• Query Enrichment– Relevance feedback (indicative)
• User designates “more like this” documents
• System adds terms from those documents to the query
– Manual reformulation (informative)• Better approximation of visceral information need
![Page 30: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/30.jpg)
System’s Goals
• Assist the user to– Identify relevant documents– Identify potential useful terms
• for clarifying the right information need
• for generating better queries
![Page 31: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/31.jpg)
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
![Page 32: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/32.jpg)
A Selection Interface Taxonomy
• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, RSV threshold
• Two dimensional displays– Construction: clustering, starfields, projection– Navigation: jump, pan, zoom
• Three dimensional displays– Contour maps, fishtank VR, immersive VR
![Page 33: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/33.jpg)
Extraction-Based Summarization
• Robust technique for making disfluent summaries
• Four broad types:– Single-document vs. multi-document– Term-oriented vs. sentence-oriented
• Combination of evidence for selection:– Salience: similarity to the query– Selectivity: IDF or chi-squared– Emphasis: title, first sentence
• For multi-document, suppress duplication
![Page 34: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/34.jpg)
Generated Summaries
• Fluent summaries for a specific domain• Define a knowledge structure for the domain
– Frames are commonly used
• Analysis: process documents to fill the structure– Studied separately as “information extraction”
• Compression: select which facts to retain• Generation: create fluent summaries
– Templates for initial candidates– Use language model to select an alternative
![Page 35: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/35.jpg)
Google’s KWIC Summary
• For Query “University of Maryland College Park”
![Page 36: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/36.jpg)
Teoma’s Query Refine Suggestions
url: www.teoma.com
![Page 37: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/37.jpg)
Vivisimo’s Clustering Results
url: vivisimo.com
![Page 38: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/38.jpg)
Kartoo’s Cluster Visualization
url: kartoo.com
![Page 39: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/39.jpg)
Cluster Formation• Based on inter-document similarity
– Computed using the cosine measure, for example
• Heuristic methods can be fairly efficient– Pick any document as the first cluster “seed”– Add the most similar document to each cluster
• Adding the same document will join two clusters
– Check to see if each cluster should be split• Does it contain two or more fairly coherent groups?
• Lots of variations on this have been tried
![Page 40: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/40.jpg)
Starfield
![Page 41: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/41.jpg)
Dynamic Queries:• IVEE/Spotfire/Filmfinder (Ahlberg &
Shneiderman 93)
![Page 42: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/42.jpg)
Constructing Starfield Displays
• Two attributes determine the position– Can be dynamically selected from a list
• Numeric position attributes work best– Date, length, rating, …
• Other attributes can affect the display– Displayed as color, size, shape, orientation, …
• Each point can represent a cluster
– Interactively specified using “dynamic queries”
![Page 43: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/43.jpg)
Projection
• Depict many numeric attributes in 2 dimensions– While preserving important spatial relationships
• Typically based on the vector space model– Which has about 100,000 numeric attributes!
• Approximates multidimensional scaling– Heuristic approaches are reasonably fast
• Often visualized as a starfield– But the dimensions lack any particular meaning
![Page 44: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/44.jpg)
Contour Map Displays
• Display a cluster density as terrain elevation– Fit a smooth opaque surface to the data
• Visualize in three dimensions– Project two 2-D and allow manipulation– Use stereo glasses to create a virtual “fishtank”– Create an immersive virtual reality experience
• Mead mounted stereo monitors and head tracking
• “Cave” with wall projection and body tracking
![Page 45: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/45.jpg)
ThemeView
Credit to: Pacific Northwest National Laboratory
![Page 46: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/46.jpg)
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
![Page 47: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/47.jpg)
Full-Text Examination Interfaces
• Most use scroll and/or jump navigation– Some experiments with zooming
• Long documents need special features– “Best passage” function helps users get started
• Overlapping 300 word passages work well
– “Next search term” function facilitates browsing
• Integrated functions for relevance feedback– Passage selection, query term weighting, …
![Page 48: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/48.jpg)
A Long Document
![Page 49: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/49.jpg)
Document lens
Robertson & Mackinlay, UIST'93, Atlanta, 1993
![Page 50: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/50.jpg)
TileBar
[Hearst et al 95]
![Page 51: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/51.jpg)
SeeSoft
[Eric 94]
![Page 52: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/52.jpg)
Things That Help
• Show the query in the selection interface– It provides context for the display
• Explain what the system has done– It is hard to control a tool you don’t understand
• Highlight search terms, for example
• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility
![Page 53: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/53.jpg)
Document Delivery
DocumentExamination
DocumentDelivery
Document
User
![Page 54: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/54.jpg)
Delivery Modalities
• On-screen viewing– Good for hypertext, multimedia, cut-and-paste, …
• Printing– Better resolution, portability, annotations, …
• Fax-on-demand– Really just another way to get to a printer
• Synthesized speech– Useful for telephone and hands-free applications
![Page 55: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/55.jpg)
Take-Away Messages
• IR process belongs to users
• Matching documents for a query is only part of the whole IR process
• But IR system can help users
• And IR systems need to support– Query formulation/reformulation– Document Selection/Examination
![Page 56: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/56.jpg)
Two Minute Paper
• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.
• What was the muddiest point in today’s lecture?
![Page 57: Slides](https://reader036.fdocuments.us/reader036/viewer/2022062721/568137e7550346895d9f9655/html5/thumbnails/57.jpg)
Alternate Query Modalities
• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies
• But some error correction method must be included
• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used
• Ordinary handwriting often has too much ambiguity