WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research,...
-
Upload
clare-reynolds -
Category
Documents
-
view
218 -
download
1
Transcript of WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research,...
WWW ChallengesWWW Challenges::Supporting Users in Search and NavigationSupporting Users in Search and Navigation
Natasa Milic-FraylingNatasa Milic-FraylingMicrosoft Research, Cambridge UKMicrosoft Research, Cambridge UK
SOFSEM 2004SOFSEM 2004
January 28, 2004January 28, 2004
IntroductionIntroduction
ResearchResearch: : Web usage and interfacesWeb usage and interfacesOptimization of service architecturesOptimization of service architecturesText Classification – support for document classification, Text Classification – support for document classification, routing, filteringrouting, filtering
Presentation FocusPresentation Focus WWW challengesWWW challenges in designing effective services and applications. in designing effective services and applications.
IntersectionIntersection Browser InterfaceBrowser Interface – –
Internet, Intranet, services, Internet, Intranet, services, local drives. local drives.
Devices and applications:Devices and applications: TabletPC, PDA, eBookTabletPC, PDA, eBook
Services:Services: MSN Portal and MSN Portal and Search - on-line searching, Search - on-line searching, reading, and browsingreading, and browsing
IntroductionIntroductionIntersectionIntersection
Browser InterfaceBrowser Interface – – Internet, Intranet, services, Internet, Intranet, services, local drives. local drives.
Devices and applicationsDevices and applications: : TabletPC, PDA, eBookTabletPC, PDA, eBook
ServicesServices: MSN Portal and : MSN Portal and Search - on-line searching, Search - on-line searching, reading, and browsingreading, and browsing
ResearchResearch: : Web usage and interfacesWeb usage and interfacesOptimization of service architecturesOptimization of service architecturesText Classification – support for document classification, Text Classification – support for document classification, routing, filteringrouting, filtering
Presentation FocusPresentation Focus WWW challengesWWW challenges in designing effective services and applications. in designing effective services and applications.
IntroductionIntroduction
ResearchResearch: : Web usage and interfacesWeb usage and interfacesOptimization of service architecturesOptimization of service architecturesText Classification – support for document classification, Text Classification – support for document classification, routing, filteringrouting, filtering
Presentation FocusPresentation Focus WWW challengesWWW challenges in designing effective services and applications. in designing effective services and applications.
IntersectionIntersection Browser InterfaceBrowser Interface – –
Internet, Intranet, services, Internet, Intranet, services, local drives. local drives.
Devices and applications:Devices and applications: TabletPC, PDA, eBookTabletPC, PDA, eBook
Services:Services: MSN Portal and MSN Portal and Search - on-line searching, Search - on-line searching, reading, and browsingreading, and browsing
Characteristics of the WebCharacteristics of the Web
Highly distributed:Highly distributed: distributed data and distributed data and processesprocesses
HighlyHighly dynamicdynamic
Evolving content, with still inadequate Evolving content, with still inadequate contentcontent publishing practice. publishing practice.
IMPLICATIONS IMPLICATIONS
On-line ExperienceOn-line Experience
Web access is a Web access is a combination of search combination of search and navigationand navigation Search to find URL of relevant pagesSearch to find URL of relevant pages Navigation to explore result spaceNavigation to explore result space Reading on devices of various display sizes.Reading on devices of various display sizes.
Only limited “context” in both activities preserved Only limited “context” in both activities preserved and exposedand exposed Ineffective searchIneffective search Lost in hyperspaceLost in hyperspace Lost within a document, on small screen Lost within a document, on small screen
devices. devices.
‘‘Diagnoses’Diagnoses’
Three aspects of the WebThree aspects of the Web
Separation of search and document Separation of search and document deliverydelivery
Separation of document authoring and Separation of document authoring and generation of metadatageneration of metadata about the about the documents required by services and documents required by services and applicationsapplications
Lack of generic publishing formatLack of generic publishing format to support to support flexible display of content across devices.flexible display of content across devices.
Part IPart I
Separation of search and document deliverySeparation of search and document delivery
Ineffective SearchIneffective Search
MIDAS - SiteExplorerMIDAS - SiteExplorer
Query
URLsURLs
User’s Information User’s Information NeedNeed
User’s Information User’s Information NeedNeed
Web ServerWeb Server
Search EngineSearch Engine
Web ServerWeb Server
HTTP RequestHTTP Request
Search processesSearch processes
Web page deliveryWeb page delivery
MS READ ServiceMS READ Service MS READ ServiceMS READ Service
Highlighting - How is it done ?Highlighting - How is it done ?
Query
URLsURLs
Query syntactic AnalysisQuery syntactic AnalysisSemantic ExpansionSemantic ExpansionHighlighting RegimeHighlighting RegimeThumbnail CreationThumbnail Creation
Query syntactic AnalysisQuery syntactic AnalysisSemantic ExpansionSemantic ExpansionHighlighting RegimeHighlighting RegimeThumbnail CreationThumbnail Creation
User’s Information User’s Information NeedNeed
User’s Information User’s Information NeedNeed
Topic DescriptionTopic Description
Web ServerWeb Server
Search EngineSearch Engine
Web ServerWeb Server
HTTP RequestHTTP Request
MS READ MS READ ServiceService
MS READ MS READ ServiceService
Link Evaluation - How is it done ?Link Evaluation - How is it done ?
• NLPNLP
• IndexingIndexing
• Search Over Search Over Local IndexLocal Index
Web ServerWeb Server
TopicTopicStorage:Storage:
Topic 1Topic 1Topic 2Topic 2Topic 3Topic 3Topic 4Topic 4
HTTP Requests HTTP Requests for Text Onlyfor Text Only
Mark Links for RelevanceMark Links for Relevance
Download Text OnlyDownload Text Only
MS ReadMS Read
Users have difficulty locating relevant parts of a Web page while Users have difficulty locating relevant parts of a Web page while reviewing search results reviewing search results (MSN Search Diary and Field Interviews)(MSN Search Diary and Field Interviews)
Users have difficulty evaluating search results and refining their search Users have difficulty evaluating search results and refining their search (Anne Cohen-Kiel’s ethnographic study in Spain, UK and Canada; (Anne Cohen-Kiel’s ethnographic study in Spain, UK and Canada; MSN Search Diary Study and Site Interviews).MSN Search Diary Study and Site Interviews).
Solution:Solution:Preserve user’s topic of interest and provide Preserve user’s topic of interest and provide highlighting of topic terms highlighting of topic terms on the pages that the user is viewingon the pages that the user is viewing. .
Allow the users to Allow the users to enhance the topicenhance the topic by adding new query terms or by adding new query terms or resources (lists of concepts, entities, etc.) and resources (lists of concepts, entities, etc.) and perform search over the perform search over the page contentpage content
Allow the user to search the content of the Allow the user to search the content of the pages that are linkedpages that are linked to the to the current page. current page.
When the page is the search result page, this is equivalent to When the page is the search result page, this is equivalent to refining the refining the searchsearch over the over the previous top N search resultsprevious top N search results. .
MIDAS and SiteExplorerMIDAS and SiteExplorer
Separation of document authoring and Separation of document authoring and generation of metadatageneration of metadata about the documentsabout the documents
required by services and applicationsrequired by services and applications
User lost in the hyperspaceUser lost in the hyperspace
Part IIPart II
ProblemProblem
Crawling - Crawling - Services, such as search engines, collect the Services, such as search engines, collect the data and create metadata but data and create metadata but do not deliver the contentdo not deliver the content Out of sync with the data on the Web servers Out of sync with the data on the Web servers ‘broken links’ ‘broken links’
Services can perform only basic analysis of the context Services can perform only basic analysis of the context No information about structure of information resourcesNo information about structure of information resources No sophisticated linguistic process.No sophisticated linguistic process.
Solution: Solution: MIDAS FrameworkMIDAS Framework
Distributed metadata generationDistributed metadata generation
Generate & store meta-information Generate & store meta-information alongside contentsalongside contents At authoring or publishing timeAt authoring or publishing time Synchronised with publishingSynchronised with publishing
Deliver metadata upon requestDeliver metadata upon request
In case of centralized servicesIn case of centralized services Services do not crawl for data but only for metadataServices do not crawl for data but only for metadata Obtain data through ‘push’ by authors/web servers.Obtain data through ‘push’ by authors/web servers.
Site structureSite structure
Page structurePage structure
METADATAMETADATA:: Linguistic analysisLinguistic analysis
Statistical analysisStatistical analysis
Visual representationVisual representation
Site structureSite structure
Page structurePage structure
METADATAMETADATA:: Linguistic analysisLinguistic analysis
Statistical analysisStatistical analysis
Visual representationVisual representation
AUTHORAUTHOR CLIENTCLIENTSERVERSERVER
Web ServerWeb Server
Web ContentWeb Content Web ContentWeb Content
<dxf:views> <dxf:view title="Main"> <dxf:node url="index.htm"> <dxf:node url="aboutme.htm" /> <dxf:node url="interest.htm" /> <dxf:node url="favorite.htm" /> <dxf:node url="photo.htm" /> <dxf:node url="feedback.htm" /> </dxf:node> </dxf:view></dxf:views>
AUTHORAUTHOR CLIENTCLIENT
Metadata ServerMetadata Server
SERVERSERVER
Web ServerWeb Server
Automatically Automatically Generated Generated
MetadataMetadata
Web ContentWeb Content Web ContentWeb Content
FrontPage Site Template and FrontPage Site Template and Structure in XML FormatStructure in XML Format
SiteExplorerSiteExplorer
Author Author generated generated metadatametadata
Web metadata Web metadata (XML)(XML)
MIDAS is MIDAS is NOTNOT……
……an element of the Semantic Weban element of the Semantic Web
Not adding “knowledge” explicitly into the Not adding “knowledge” explicitly into the WebWeb
SimpleSimple metadata metadata Easily authored/easily computable at Easily authored/easily computable at
authoring/publishing timeauthoring/publishing time Presently available but dismissedPresently available but dismissed
Problems addressed Problems addressed Users have difficulty choosing the right website from the result setUsers have difficulty choosing the right website from the result set
Users want overviews of sites in a list of search results Users want overviews of sites in a list of search results (Anne Cohen-Kiel’s ethnographic study in Spain, (Anne Cohen-Kiel’s ethnographic study in Spain, UK and Canada)UK and Canada)
Users have difficulty evaluating search results and refining their search Users have difficulty evaluating search results and refining their search (MSN Search Diary Study and (MSN Search Diary Study and
Site Interviews)Site Interviews)
Users have difficulty locating relevant information within a destination site once they get to Users have difficulty locating relevant information within a destination site once they get to the sitethe site (MSN Search Diary Study and Site Interviews)(MSN Search Diary Study and Site Interviews)
Site Explorer’s Solutions:Site Explorer’s Solutions: Providing users with an Providing users with an overview of the site content overview of the site content asas interactive sitemap interactive sitemap Supporting exploration of the site through Supporting exploration of the site through local searchlocal search
“Anyone who has been to a shopping mall knows the value of the ‘you are here’ dot on the map …
Site maps must become more aware of users’ website navigation…”
Jakob Nielsen, Site Map Usability January 6, 2002
External studies External studies
SmartView and SearchMobilSmartView and SearchMobilViewing Web on PDAs and Mobile PhonesViewing Web on PDAs and Mobile Phones
Lack of generic publishing formatLack of generic publishing format to support to support flexible display of content across devicesflexible display of content across devices
Ineffective reading on mobile Ineffective reading on mobile devices devices
Part IIIPart III
Lost in Hyperspace - SmallLost in Hyperspace - Small
Complex pages on Complex pages on small screenssmall screens Overview Overview
– – none provided at the none provided at the momentmoment
Extensive Extensive horizontal/vertical horizontal/vertical scrollingscrolling
Lost in Hyperspace - SmallLost in Hyperspace - Small
Location of Location of search search hitshits on result page on result page Difficulty even on Difficulty even on
desktop screensdesktop screens Reason: Reason: disassociationdisassociation
of search service and of search service and document deliverydocument delivery
SearchMobil SearchMobil
SearchMobil Web ServiceSearchMobil Web Service Collection of search results – “booklet” of Web Collection of search results – “booklet” of Web
pagespages Creation of the “local” full text indexCreation of the “local” full text index
Search within a designated set of pagesSearch within a designated set of pages Annotated booklets (hit highlighting)Annotated booklets (hit highlighting)
Web SearchWeb Search
On-line search: GoogleOn-line search: Google Automatic downloadAutomatic download of pages of pages Processing of pages – Processing of pages –
structure discoverystructure discovery and and content indexingcontent indexing
Creation of a Creation of a booklet of booklet of overviewsoverviews
Indicators of Indicators of search hitssearch hits Indicator of the Indicator of the best regionbest region – –
scroll down the ‘red’ section scroll down the ‘red’ section Select the region and access Select the region and access
the the detailed viewdetailed view
SearchMobil FeaturesSearchMobil Features
Web Search – Detail ViewWeb Search – Detail View
SearchMobil FeaturesSearchMobil Features
On-line search: GoogleOn-line search: Google Automatic downloadAutomatic download of pages of pages Processing of pages – Processing of pages –
structure discoverystructure discovery and and content indexingcontent indexing
Creation of a Creation of a booklet of booklet of overviewsoverviews
Indicators of Indicators of search hitssearch hits Indicator of the Indicator of the best regionbest region – –
scroll down the ‘red’ section scroll down the ‘red’ section Select the region and access Select the region and access
the the detailed viewdetailed view
Local SearchLocal Search
SearchMobil Features – Cont.SearchMobil Features – Cont.
Local searchLocal search – focussed on – focussed on the set of pages in the bookletthe set of pages in the booklet
Indicators of relevance at the Indicators of relevance at the page and the booklet levelpage and the booklet level
SummarySummarySimple proposition: Simple proposition:
SaveSave metadata about structure and content generated metadata about structure and content generated by authoring applicationsby authoring applications
Benefits on the client side:Benefits on the client side: Rich Rich context for search and navigationcontext for search and navigation Interactive download of document elements and metadata for Interactive download of document elements and metadata for
small devicessmall devices
Benefit for services:Benefit for services: Metadata collected and in sMetadata collected and in s Opportunity for new services based on rich metadataOpportunity for new services based on rich metadata Opportunity for push based services – reduce the need for Opportunity for push based services – reduce the need for
crawling. crawling.