SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.
SIMS 296a-3:SIMS 296a-3:Aids for Source SelectionAids for Source Selection
Carol Butler Carol Butler
Fall ‘98Fall ‘98
Carol ButlerFall 98
OutlineOutline
IA InterfacesIA Interfaces
Design PrinciplesDesign Principles
Aids for Source SelectionAids for Source Selection
SavvySearchSavvySearch
HITSHITS
Kohonen mapsKohonen maps
Implications for New ResearchImplications for New Research
Carol ButlerFall 98
IA Interface should help User:IA Interface should help User:
Express information needs and/or formulate Express information needs and/or formulate queries.queries.
Select among available sources.Select among available sources. Understand search results.Understand search results.
From: User Interfaces and Visualization, by Marti A. Hearst, 1998.
Carol ButlerFall 98
IA Interface should allow User IA Interface should allow User to:to:
Reassess goals and adjust search strategy.Reassess goals and adjust search strategy. Follow trails with unanticipated results.Follow trails with unanticipated results. Monitor the progress of a search strategy.Monitor the progress of a search strategy. Use output of one action as input to the next.Use output of one action as input to the next.
From: User Interfaces and Visualization, by Marti A. Hearst, 1998.
Carol ButlerFall 98
Role of Visualization:Role of Visualization:
Communicate more rapidly and effectively.Communicate more rapidly and effectively. TechniquesTechniques
icons and color highlightingicons and color highlighting brushing and linkingbrushing and linking panning and zoomingpanning and zooming focus-plus-contextfocus-plus-context animationanimation
InteractivityInteractivity
From: User Interfaces and Visualization, by Marti A. Hearst, 1998.
Carol ButlerFall 98
““Visualization of inherently abstract information Visualization of inherently abstract information is more difficult, and visualization of textually is more difficult, and visualization of textually represented information is especially represented information is especially challenging.”challenging.”
From: User Interfaces and Visualization, by Marti A. Hearst, 1998.
Carol ButlerFall 98
Starting Points for SearchStarting Points for Search
Lists of sources (Lexis-Nexis)Lists of sources (Lexis-Nexis) OverviewsOverviews
ClustersClusters Category Hierarchies/Subject CodesCategory Hierarchies/Subject Codes Co-citation LinksCo-citation Links
ExamplesExamples Automatic source selectionAutomatic source selection
Carol ButlerFall 98
Last Week’s ReadingsLast Week’s Readings
Overviews via Category HierarchiesOverviews via Category Hierarchies HIBROWSE (Pollitt 97)HIBROWSE (Pollitt 97) Cat-A-Cone (Hearst 97)Cat-A-Cone (Hearst 97)
Carol ButlerFall 98
Today’s ReadingsToday’s Readings
Automatic Source SelectionAutomatic Source Selection SavvySearch (Howe & Dreilinger 97)SavvySearch (Howe & Dreilinger 97)
Overviews via co-citation hyperlinksOverviews via co-citation hyperlinks HITS (Kleinberg et al. 97)HITS (Kleinberg et al. 97)
Overviews via clustersOverviews via clusters Kohonen maps (Chen et al. 97)Kohonen maps (Chen et al. 97)
Carol ButlerFall 98
SavvySearchSavvySearch
Addresses problems with meta-search Addresses problems with meta-search engines.engines. reduce burden on user … butreduce burden on user … but may waste computational and Web may waste computational and Web
resourcesresources Carefully selects search engines likely to Carefully selects search engines likely to
return useful results.return useful results.
Carol ButlerFall 98
Options provided by Options provided by interfaceinterface Sources and types of information.Sources and types of information. Treatment of query terms.Treatment of query terms. Display of results.Display of results. Interface language.Interface language. View interfaceView interface..
Carol ButlerFall 98
Query ProcessingQuery Processing Reasoning about available resources Reasoning about available resources
modify concurrency (number of search modify concurrency (number of search engines queried in parallel)engines queried in parallel)
network load estimates (lookup table, time)network load estimates (lookup table, time) local CPU load (UNIX local CPU load (UNIX uptimeuptime command) command)
Ranking search enginesRanking search engines learned associations between search learned associations between search
engines and query terms (stored in a meta-engines and query terms (stored in a meta-index)index)
recent data on performancerecent data on performance
Carol ButlerFall 98
Meta-IndexMeta-Index
No Results No Results search engine failed to return linkssearch engine failed to return links reduces confidence that this engine is reduces confidence that this engine is
appropriate for particular queryappropriate for particular query effectiveness values are reducedeffectiveness values are reduced
VisitsVisits number of links explored by usernumber of links explored by user indicates user found some links to be indicates user found some links to be
interesting and increases confidenceinteresting and increases confidence
Carol ButlerFall 98
Future DevelopmentFuture Development
Meta-search will need to be personalized Meta-search will need to be personalized and embedding in other systems.and embedding in other systems.
Experimental versionExperimental version divides search into divides search into categories, with separate sets of rules for categories, with separate sets of rules for creating a search plan.creating a search plan.
•Web Indexes•Web Directories•Usenet News•Software
•People•Reference•Entertainment•Technical Reports
Carol ButlerFall 98
Hyperlink-Induced Topic Hyperlink-Induced Topic Search (HITS)Search (HITS) System for locating authoritative web System for locating authoritative web
sourcessources Two premises:Two premises:
Implicit annotation provided by creators of Implicit annotation provided by creators of hyperlinks contains sufficient information to hyperlinks contains sufficient information to infer a notion of “authority.infer a notion of “authority.
Sufficiently broad topics contain embedded Sufficiently broad topics contain embedded communities of hyperlinked pages.communities of hyperlinked pages.
Carol ButlerFall 98
HITSHITS
Two types of pagesTwo types of pages AuthoritiesAuthorities
highly referenced pages on the topichighly referenced pages on the topic
HubsHubs pages that “point” to many of the authoritiespages that “point” to many of the authorities
Mutually reinforcing relationshipsMutually reinforcing relationships Starts from a user-supplied queryStarts from a user-supplied query
Carol ButlerFall 98
HITS methodHITS method Base set of pages returned by search engineBase set of pages returned by search engine Add pages that point to, or are pointed to by, Add pages that point to, or are pointed to by,
any page in base setany page in base set Assign each page a Assign each page a hub weight h(p)hub weight h(p) and and
authority weight a(p) authority weight a(p) (initialize to 1)(initialize to 1) For each page:For each page:
Replace Replace a(p)a(p) by the sum of the by the sum of the h()h()’s of all pages pointing to it’s of all pages pointing to it Replace Replace h(p)h(p) by the sum of the by the sum of the a()a()’s of all pages pointed to ’s of all pages pointed to
by itby it
RepeatRepeat
Carol ButlerFall 98
HITS resultsHITS results Broad topics tend to have robust structureBroad topics tend to have robust structure
astrophysicsastrophysics Michael JordanMichael Jordan
Generalizes topics not sufficiently broadGeneralizes topics not sufficiently broad Dennis RitchieDennis Ritchie
Density of linkage on a topic influences authority/hub Density of linkage on a topic influences authority/hub structurestructure English literature vs. German literatureEnglish literature vs. German literature
Web-centric topicsWeb-centric topics cryptographycryptography
CommercializationCommercialization tennistennis
Carol ButlerFall 98
Future DevelopmentFuture Development
Study temporal evolution of communities Study temporal evolution of communities on the Web.on the Web.
Combining text and the structure of Combining text and the structure of hyperlinks.hyperlinks. text within <href>text within <href> text near hyperlinktext near hyperlink
CLEVERCLEVER project at IBM Almaden Research project at IBM Almaden Research CenterCenter
Carol ButlerFall 98
Automatically Generated Automatically Generated Concept Space (Kohonen Concept Space (Kohonen map and ET-Space map and ET-Space Thesaurus)Thesaurus)
IR users need:IR users need: Working knowledge of the system where the Working knowledge of the system where the
information is storedinformation is stored how to navigatehow to navigate how info is categorized or organizedhow info is categorized or organized
Knowledge of the subject of interestKnowledge of the subject of interest particularly the vocabulary of the subject domainparticularly the vocabulary of the subject domain
Carol ButlerFall 98
Browsing vs. SearchingBrowsing vs. Searching BrowsingBrowsing
users rely on mental modelsusers rely on mental models embedded digression problemembedded digression problem
SearchingSearching content-basedcontent-based two basic approachestwo basic approaches
keyword searchkeyword search combined keyword search and categorizationcombined keyword search and categorization
vocabulary differences problemvocabulary differences problem
Carol ButlerFall 98
User Aids for BrowsingUser Aids for Browsing DirectoriesDirectories
categories limited in granularitycategories limited in granularity categories limited in timelinesscategories limited in timeliness creating categories is manual, slow, and creating categories is manual, slow, and
cumbersomecumbersome Kohonen self-organizing map (SOM)Kohonen self-organizing map (SOM)
generates clusters of important conceptsgenerates clusters of important concepts
Carol ButlerFall 98
Concept “Landscapes”Concept “Landscapes”
Pharmacology
Anatomy
Legal
Disease
Hospitals
Built using Kohonen Feature MapsXia Lin, H.C. Chenslide by Marti Hearst
Carol ButlerFall 98
User Aids for SearchingUser Aids for Searching Query expansionQuery expansion Relevance feedbackRelevance feedback Multidimensional scalingMultidimensional scaling
metric similarity modelingmetric similarity modeling latent semantic indexinglatent semantic indexing
Thesauri useThesauri use incorporating existing thesauriincorporating existing thesauri automatic thesaurus generationautomatic thesaurus generation
Carol ButlerFall 98
Automatic Thesaurus Automatic Thesaurus GenerationGeneration
Statistical co-occurrenceStatistical co-occurrence Cluster analysis further groups termsCluster analysis further groups terms Chen et al.Chen et al.
document collectiondocument collection automatic indexingautomatic indexing co-occurrence analysisco-occurrence analysis associative retrievalassociative retrieval
Et-Space WebpageEt-Space Webpage
Carol ButlerFall 98
Experiment with YahooExperiment with Yahoo Browsing tested with Kohonen SOMBrowsing tested with Kohonen SOM
subjects who started with Yahoo were less subjects who started with Yahoo were less successful in repeating the task with the SOM successful in repeating the task with the SOM than vice versathan vice versa
useful more for broad exploring than for useful more for broad exploring than for searchingsearching
Searching tested with AGTSearching tested with AGT suggested terms came from web pagessuggested terms came from web pages most useful in further refining an initially too most useful in further refining an initially too
broad searchbroad search
Carol ButlerFall 98
Future DevelopmentFuture Development Effects of different information sourcesEffects of different information sources
cohesioncohesion consistent with user’s mental modelconsistent with user’s mental model
User Interface designUser Interface design flexibilityflexibility spelling errors and typosspelling errors and typos pan-zoompan-zoom help screens or instructions (or more help screens or instructions (or more
intuitive design, or both)intuitive design, or both)
Carol ButlerFall 98
Review and DiscussionReview and Discussion
OverviewsOverviews Category LabelsCategory Labels
when docs stored “inside” categories, when docs stored “inside” categories, users cannot create queries based on users cannot create queries based on combinations of categoriescombinations of categories
display of hierarchies takes up large display of hierarchies takes up large amounts of screen spaceamounts of screen space
tightly coupled with queries?tightly coupled with queries? Other starting pointsOther starting points
Carol ButlerFall 98
Overviews in the User Overviews in the User InterfaceInterface Unsupervised Groupings Unsupervised Groupings
ClusteringClustering Kohonen Feature MapsKohonen Feature Maps
Supervised CategoriesSupervised Categories Yahoo!Yahoo! SuperbookSuperbook HiBrowseHiBrowse Cat-a-ConeCat-a-Cone
CombinationsCombinations DynaCatDynaCat SONIASONIA
Carol ButlerFall 98
Category Labels Category Labels (from Hearst slide)(from Hearst slide)
Advantages:Advantages: InterpretableInterpretable Capture summary informationCapture summary information Describe multiple facets of contentDescribe multiple facets of content Domain dependent, and so descriptiveDomain dependent, and so descriptive
DisadvantagesDisadvantages Do not scale well (for organizing documents)Do not scale well (for organizing documents) Domain dependent, so costly to acquireDomain dependent, so costly to acquire May mis-match users’ interestsMay mis-match users’ interests
Carol ButlerFall 98
Other Starting Points Other Starting Points ApproachesApproaches
Co-citation LinksCo-citation Links Examples, Guided ToursExamples, Guided Tours
Carol ButlerFall 98
Review and Discussion Review and Discussion (cont..)(cont..)
Interface DesignInterface Design VisualizationVisualization
textual vs. 2D spatial representationtextual vs. 2D spatial representation Search StrategiesSearch Strategies
integration with non-search parts of integration with non-search parts of process (reading, annotating, analysis)process (reading, annotating, analysis)
Evaluation MethodologyEvaluation Methodology