Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28.
-
Upload
amanda-singleton -
Category
Documents
-
view
216 -
download
1
Transcript of Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28.
Searching in HypertextSearching in Hypertext
Prof. Marti HearstProf. Marti Hearst
SIMS 202, Lecture 28SIMS 202, Lecture 28
Marti A. HearstSIMS 202, Fall 1997
TodayToday
HypertextHypertext What is it? What does it mean to search on it?
Marti A. HearstSIMS 202, Fall 1997
What is Hypertext?What is Hypertext? Non-sequentially-linked pieces of text or other Non-sequentially-linked pieces of text or other
information typeinformation type Each “piece” of information is called a node Nodes are connected by computer-supported links
to create an information network
Goals:Goals: Access and read text from multiple
viewpoints Link related pieces of information together Annotate existing texts
Marti A. HearstSIMS 202, Fall 1997
What is Hypertext?What is Hypertext?
Links can be uni- or bi-directionalLinks can be uni- or bi-directional Links can have typesLinks can have types
argument structure: causation, support metadata: author, publisher
A Browser provides navigational aidsA Browser provides navigational aids table of contents history lists bookmarks guided tours maps/overviews
Marti A. HearstSIMS 202, Fall 1997
What is Hypertext?What is Hypertext?
Influential writers on the topicInfluential writers on the topic Vannevar Bush ‘64 (US Director of Science
and Research) Ted Nelson ‘67: Coined the term Doug Engelbart ‘63: a variant approach Conklin ‘87: Definitive overview Halasz et al. ‘87: Notecards in a nutshell
Systems:Systems: notecards, hypercard, superbook, many
others
Marti A. HearstSIMS 202, Fall 1997
The WWW and HypertextThe WWW and Hypertext
Finally realized the dream of visionaries!Finally realized the dream of visionaries! HTML allows only impoverished HTML allows only impoverished
functionalityfunctionality no back links only one type of link
Standard browsers are lacking as wellStandard browsers are lacking as well No history list No overview of sites Inconsistent marking of already visited pages Bookmarks are uniform in appearance
Marti A. HearstSIMS 202, Fall 1997
What does it mean to search What does it mean to search hypertext?hypertext?
Marti A. HearstSIMS 202, Fall 1997
What does it mean to search What does it mean to search hypertext?hypertext? Nelson ‘67 explicitly dissociates hypertext from Nelson ‘67 explicitly dissociates hypertext from
document retrievaldocument retrieval Nice for browsing, but bad for fact retrieval?Nice for browsing, but bad for fact retrieval? A contrast: planning vs. browsingA contrast: planning vs. browsing
Simply follow links less effort learn by serendipity (maybe)
Formulate query and show pages of hits as in standard IR requires planning systems assumed a professional intermediary
Hypertext allows for a novel interaction between Hypertext allows for a novel interaction between navigating links and formulating queriesnavigating links and formulating queries
Marti A. HearstSIMS 202, Fall 1997
Major challenge: Major challenge: (Marchionini & Shneiderman 88)(Marchionini & Shneiderman 88)
Balancing the power of analytic search with the ease of browsing
Solution: flexible, powerful user interfaces that allow for
selection feedback But: flexibility leads to complexity
Marti A. HearstSIMS 202, Fall 1997
How to evaluation search in How to evaluation search in hypertext?hypertext?
One strategy -- evaluate a hypertexted One strategy -- evaluate a hypertexted book, manual, or help system.book, manual, or help system. Compare against paper Compare query formulation with link
navigation Ignore hyperlinks and simply compare
ranking algorithms Problem: can end up evaluating the Problem: can end up evaluating the
quality of the hyperlinksquality of the hyperlinks
Marti A. HearstSIMS 202, Fall 1997
Hypertext Source TextsHypertext Source Texts
Encyclopedia (Marchionini 89)Encyclopedia (Marchionini 89) Software manual (Egan et al. 89, Software manual (Egan et al. 89,
Campagnoni & Ehrlich 89)Campagnoni & Ehrlich 89) Library of Chemistry Journal Library of Chemistry Journal
Articles (Egan et al.)Articles (Egan et al.)
Marti A. HearstSIMS 202, Fall 1997
Superbook (Remde et al. 87)Superbook (Remde et al. 87)
Next-generation hyper-media bookNext-generation hyper-media book Functions:Functions:
Word Lookup: Show a list query words, stems, and word combinations
Table of Contents: Dynamic fisheye view of the hierarchical topics list
Search words can be highlighted here too Page of Text: show selected page and highlighted
search terms Hypertext features linking through search Hypertext features linking through search
words rather than page linkswords rather than page links
Marti A. HearstSIMS 202, Fall 1997
Superbook Superbook (http://superbook.bellcore.com/SB)(http://superbook.bellcore.com/SB)
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Goal: compare Superbook with paper bookGoal: compare Superbook with paper book Tasks:Tasks:
structured search: find answer to a specific question using an unfamiliar reference text
open-book essay: synthesize material from different places in the document
incidental learning: how much useful information about the document is acquired while doing other tasks
subjective ratings: user reactions to the form and content
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Factors for structured search:Factors for structured search: Does the user’s question correspond to
the author’s organization of the material? Half the study search questions contained cues as to
which topic heading to use, half did not
Does the user’s query as stated contain some of the same words as those used by the author?
Half the questions contained words taken from the text surrounding the target text, half did not
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Example search questions:Example search questions: Find the section discussing the basic concept that the value of
any expression, however complicated, is a data structure. The dataset ‘murder’ contains murder rates per 100,000
population. Find the section that says which staes are included in this dataset.
Find the section that describes pie charts and states whether or not they are a good means for analyzing data.
Find the section that describes the first thing you have to do to get S to print pictoral output.
blue boldfaceblue boldface:: terms taken from text terms taken from text
pink italicspink italics:: terms taken from topic heading terms taken from topic heading
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Hypotheses:Hypotheses: Conventional document would require good
cues from the topic headings, but Superbook would not.
Word lookup function hypothesized to allow circumvention of author’s organization scheme.
Superbook’s search facility would result in open-book essays that include more information.
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Source text: statistics package manual (562 Source text: statistics package manual (562 pp.)pp.)
Compare:Compare: superbook vs. paper versions
Four sets of search questions of mixed typeFour sets of search questions of mixed type 20 university students with stats background20 university students with stats background Superbook training tutorialSuperbook training tutorial 15 minutes per structured query15 minutes per structured query One open-book essay retainedOne open-book essay retained
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study
Results: Superbook had an advantage in:Results: Superbook had an advantage in: overall average accuracy (75% vs. 62%)
Superbook did better on questions with words from text but not in topic headings
Print version did better on questions with no search hits speed (5.4 vs. 5.6 min/query on average)
Superbook faster for text-only cues Paper faster for no questions with no hits
essay creation average score of 5.8 vs. 3.6 points out of 7 average 8.8 facts vs. 6.0 out of 15
Marti A. HearstSIMS 202, Fall 1997
Egan et al. StudyEgan et al. Study Results:Results:
Subjective ratings: Superbook users rated it easier than paper (5.8 vs. 3.1 out of 7) Superbook users gave higher ratings on the stat system
Incidental learning: Superbook users recalled more chapter headings
maybe because these were continually displayedmaybe because these were continually displayed No other differences were significant
Problems with study:Problems with study: Did not compare against non-hypertext computerized
version Did not show if/how hyperlinks affected results
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
Data source was somewhat more Data source was somewhat more hypertext likehypertext like
Eight handbooks for a computer help deskEight handbooks for a computer help desk Still only had three levels plus an indexStill only had three levels plus an index
Top level: list of the eight handbooks Second level: t-o-c of a given handbook Third level: pages with reference information
Index: a list of concepts, objects, and Index: a list of concepts, objects, and procedures described throughout procedures described throughout
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich StudySample indexSample index
Mail indexMail indexC: Cancel button …………………..C: Cancel button …………………..Writing and sending mail, 2Writing and sending mail, 2
cancelling a mail message ….cancelling a mail message ….Writing and sending mail, 2Writing and sending mail, 2
in Stay Up mode…………….in Stay Up mode…………….Writing and sending mail, 4Writing and sending mail, 4
Change DirectoryChange Directory
button …………………………..button …………………………..Saving and Loading Mail, 3Saving and Loading Mail, 3
option ……………………………option ……………………………Saving and Loading Mail, 2Saving and Loading Mail, 2
panel …………………………….panel …………………………….Saving and Loading Mail, 2Saving and Loading Mail, 2
changing compositionchanging composition
window modes ……………….window modes ……………….Writing and Sending Mail, 3Writing and Sending Mail, 3
changing the current changing the current
Mail directory …………………Mail directory …………………Reading Mail, 1Reading Mail, 1
… …
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
Questions:Questions: Previous studies suggest users would rather
just browse links than search an index Will this happen when questions are designed to
be easier with the index? It has been suggested that ability to
perceive and manipulate spatial patterns should reduce disorientation in hyperlinked systems
Is there a correlation between efficient navigation and good visualization ability?
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
Browsing defined as:Browsing defined as: scanning the table of contents and paging
through relevant topics to find answers Analytic strategy defined as:Analytic strategy defined as:
using indexes to look up specific query terms and following the links to the appropriate page
Therefore not really looking at formulating Therefore not really looking at formulating a search query.a search query.
Also, the hierarchy is very shallow.Also, the hierarchy is very shallow.
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
Test questionsTest questions Three meant to be answerable with browsing Three meant to be answered with index *
I just deleted a mail message. Where can I found how to get it back?*
I need to find all my files that were modified after August 8, 1988. Where should I look?
How do I change the size of a window so it fits the height of the screen?*
Where can I find information on how to configure how my desktop behaves when I login?
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
ResultsResults Most users preferred to browse even though
all questions could be answered with the index
subjects used index 1.92 times out of six questions subjects successful with index 1.6 times out of 6
Initial probability of succeeding with browsing quite high (75%) but then declined sharply with subsequent attempts
Probability of success with repeated attempts on index increased with number of attempts
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study Results, continuedResults, continued
Searches meant to be analytic could be resolved using browsing
2 out of 4 of the most efficient searchers did not use indexes
The other 2 used them for 2 questions. Indexes mainly used when phrasing of
question was not amenable to the t-o-c Probably can be partly attributed to the
shallowness of the information architecture
Marti A. HearstSIMS 202, Fall 1997
Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study
Results, continuedResults, continued Found a strong correlation between visualization
ability and search time. Interestingly, most of the extra time seemed to be
spent by the poorer visualizers in returning to the top level node rather than just going up one level
CommentaryCommentary The study is flawed by not allowing query formulation Also, should test conditions in which users forced to
use analytic vs browsing, rather than just observing their choices.
Marti A. HearstSIMS 202, Fall 1997
Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)
Hyperties Dataset Hyperties Dataset 14 of 16 subjects asked to search for factual
information used the alphabetical index but it did not seem to have a topic index -- only
embedded hyperlinked terms No query formulation either
Museum DatasetMuseum Dataset In undirected use, 2/3 of all selections via hyperlinks When asked to search for specific facts, index users
were much faster than link users This difference became smaller with longer use
Marti A. HearstSIMS 202, Fall 1997
Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)
Hyperties on a large encyclopediaHyperties on a large encyclopedia Elementary school students used
querying successfully even though their queries were not well-formulated
Results were listed as alphabetic titles showing term frequencies
Students scanned list, picked promising docs, and choose terms from the docs (manual relevance feedback)
Marti A. HearstSIMS 202, Fall 1997
Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)
Compared two groups of novices (high school)Compared two groups of novices (high school) Scan and select strategy, or Analytic strategy (using Boolean connectors
and planning search in advance) Effectiveness was pretty much even, but
analytical searchers were slightly faster Navigation and DisorientationNavigation and Disorientation
After finishing with an article, many subjects moved all the way to the top of the menu hierarchy to the query formulation screen rather than moving up just one level.
Marti A. HearstSIMS 202, Fall 1997
More Recent: CHI Browse-offMore Recent: CHI Browse-off
CHI: Computer-Human Interaction CHI: Computer-Human Interaction ConferenceConference
This was an informal contestThis was an informal contest Goal: Compare GUI Browsers against each Goal: Compare GUI Browsers against each
other and paperother and paper Started with a set of hierarchically organized Started with a set of hierarchically organized
“facts” (not documents)“facts” (not documents) System designers raced to find answers to System designers raced to find answers to
questions, a winner for each questionquestions, a winner for each question Then novices took a try at itThen novices took a try at it
Marti A. HearstSIMS 202, Fall 1997
CHI Browse-offCHI Browse-off
Problems with experiment designProblems with experiment design Doing a search task on a browsing system Answers are in the titles themselves Small collections
Results (my interpretation)Results (my interpretation) Those system designers who knew the most
facts/were smartest won Novices did better with familiar systems Paper won.
Marti A. HearstSIMS 202, Fall 1997
Searching on HypertextSearching on Hypertext
Conclusions: More studies neededConclusions: More studies needed Bad visualizers don’t take advantage of
the link structure Most people would rather select links
Complication: how good are the linksComplication: how good are the links Most studies not done on systems with Most studies not done on systems with
the heterogeneity of the webthe heterogeneity of the web compare yahoo and altavista, for example