Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC...

23
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April 4, 2000

Transcript of Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC...

Page 1: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Bringing Order to the Web: Automatically Categorizing Search Results

Hao Chen, CS Division, UC BerkeleySusan Dumais, Microsoft Research

ACM:CHI April 4, 2000

Page 2: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Organizing Search Results

List Organization Category Org (SWISH)

Query: jaguar

Page 3: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Outline Background

Using category structure to organize information

SWISH SystemSearching With Information Structured Hierarchically Text classification User interface

User Study Future Work

Page 4: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Using Category Structure To Organize Information

Superbook, Cat-a-Cone, etc. To Help Web Search

Yahoo!, Northern Light What’s New in SWISH?

Automatic categorization of new documents User interface that tightly couples

hierarchical category structure with search results

User study for the new user interface

Page 5: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

SWISH System Combines the Advantages of

Manually crafted & easily understood directory structure

Broad coverage from search engines System Components

Text classification models User interface

Page 6: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Text Classification Text Classification

Assign documents to one or more of a predefined set of categories

E.g., News feeds, Email - spam/no-spam, Web data

Manually vs. automatically Inductive Learning for Classification

Training set: Manually classified a set of documents

Learning: Learn classification models Classification: Use the model to automatically

classify new documents

Page 7: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Category Structure (spring 99) 13 top-level categories 150 second-level categories

Training Set ~50k web pages; chosen randomly from all

cats Top-level Categories

Training Set: LookSmart Web Directory

People & ChatReference & EducationShopping & ServicesSociety & PoliticsSports & RecreationTravel & Vacations

AutomotiveBusiness & FinanceComputers & InternetEntertainment & MediaHealth & FitnessHobbies & InterestsHome & Family

Page 8: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Learning & Classification Support Vector Machine (SVM)

Accurate and efficient for text classification (Dumais et al., Joachims)

Model = weighted vector of words “Automobile” = motorcycle, vehicle, parts, automobile,

harley, car, auto, honda, porsche … “Computers & Internet” = rfc, software, provider,

windows, user, users, pc, hosting, os, downloads ... Hierarchical Models

1 model for N top level categories N models for second level categories Very useful in conjunction w/ user interaction

Page 9: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

SWISH Architecture

manuallyclassified

webpages

SVMmodel

Train(offline)

websearchresults

localsearchresults

...Classify(online)

Page 10: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Interface Characteristics Problems

Large amount of information to display Search results Category structure

Limited screen real estate Solutions

Information overlay Distilled information display

Page 11: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Information Overlay Use tooltips to show

Summaries of web pages Category hierarchy

Page 12: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Expansion of Category Structure

Page 13: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Expansion of Web Page List

Page 14: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

User Study - ConditionsCategory Interface List Interface

Page 15: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

User Study

Page 16: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

User Study Participants:

18 intermediate Web users Tasks

30 search taskse.g., “Find home page for Seattle Art Museum”

Search terms are fixed for each task Experimental Design

Category/List – within subjects 15 search tasks with each interface

Order (Category/List First) – counterbalanced between subjects

Both Subjective and Objective Measures

Page 17: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Subjective Results 7-point rating scale (1=disagree; 7=agree) Questions:

Question Category List significanceIt was easy to use this software. 6.4 3.9 p<.001I liked using this software 6.7 4.3 p<.001I prefer this to my usual Web Search engine 6.4 4.3 p<.001It was easy to get a good sense of the range of alternatives. 6.4 4.2 p<.001I was confident that I could find information if it was there. 6.3 4.4 p<.001

The "More" button was useful 6.5 6.1 n.s.The display of summaries was useful 6.5 6.4 n.s.

Page 18: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Use of Interface Features

Average Number of Uses of Feature per Task

Interface Features Category List significanceExpansing / Collapsing Structure 0.78 0.48 p<.003

Viewing Summaries in Tooltips 2.99 4.60 p<.001Viewing Web Pages 1.23 1.41 p<.053

Page 19: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Search Time

Category: 56 secsList: 85 secs p < .002

50% faster with Category interface

RT for Category vs. List

0

20

40

60

80

100

Category List

Interface Condition

Ave

rag

e M

edia

n R

T

Page 20: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Search Time by Query Difficulty

Top20: 57 secsNotTop20: 98 secs

•No reliable interaction between query difficulty and interface condition

•Category interface is helpful for both easy and difficult queries

RT by Interface and Query Difficulty

020406080100120140160

Category List

Interface Condition

Ave

rag

e M

edia

n R

T

Easy(Top20)

Hard(NotTop20)

Page 21: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Summary Text Classification

Organize search results Use hierarchical category models Classify new web pages on-the-fly

User Interface Tightly couple search results with category structure Allow manipulation of presentation of category

structure User Study

Suggest strong preference and performance advantages for categorically organized presentation of search results

Page 22: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Open Issues Improve Accuracy of Classification Algorithms Enhance User Interface

Heuristics for selecting categories and pages to display

Query_Match: rank of page, and sometimes match score Categ_Match: p(category for each page)

Integration with non-content information Conduct End-to-end User Study More info:

http://research.microsoft.com/~sdumais

Page 23: Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Searching With Information Structured Hierarchically

SWISH