IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal...

23
IDM 2003 Workshop Stuff I’ve Seen: Stuff I’ve Seen: Susan Dumais Microsoft Research http://research.microsoft.com /~sdumais A System for Personal A System for Personal Information Retrieval and Information Retrieval and Re-Use Re-Use

Transcript of IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal...

Page 1: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Stuff I’ve Seen:Stuff I’ve Seen:

Susan DumaisMicrosoft Research

http://research.microsoft.com/~sdumais

A System for Personal A System for Personal Information Retrieval and Information Retrieval and

Re-UseRe-Use

Page 2: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

OutlineOutline

Search today Search with Stuff I’ve Seen (SIS)

With: Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, Daniel Robbins

Experiences with SIS Deployment Usage data UI innovations

Next steps for SIS

Page 3: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Search Today …Search Today …

Many locations, interfaces for

finding things (e.g., web, mail, local files, help, history, notes)

Often slow

“… the No.1 question we're trying to solve [in Longhorn] is ‘Where's my stuff?’ Right now, file space on any PC is a cesspool. “

Bill Gates, FORTUNE interview, June 23, 2002

Page 4: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Search With SISSearch With SIS Unified index of stuff you’ve seen

All types of information, e.g., files of all types, email, calendar, contacts, web pages, etc.

Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)

Automatic and immediate update of index Rich UI possibilities, since it’s your content

Get back to information you’ve seen Re-use vs. initial discovery

Page 5: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Related WorkRelated Work Several systems for improving access for specific

sources (e.g., web, mail, files, photos, music) Some integration across sources

KFTF [Jones et al., 2002] Lifestreams/Scopeware [Fertig, Freeman, Gelernter, 1996] MyLife Bits [Gemmell et al., 2002] Haystack [Adar et al., 1999; Huynh et al. 2002] Commercial products

OS: Mac Sherlock, Windows Indexing Service Apps: Enfish, 80-20 retriever, dtSearch, X1, etc.

What’s new with SIS … Full content and metadata for many different sources Extensible architecture Usage experiences and experimental data UI focus

Page 6: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS ArchitectureSIS Architecture Indexing infrastructure uses MS Search

components (note: IR platform) Gatherer – interface to content sources, e.g.,

files, http, MAPI Filters – decode different file types, e.g., word,

powerpoint, html, pdf, journal Tokenizer – break into words, including date

normalization, stemming, etc. Indexer – standard inverted index Retriever – Boolean, best match (Okapi)

User interface Client side indexing and storage

Page 7: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS Design PrinciplesSIS Design Principles Indexing …

No additional work is required User sees something, and it gets indexed

Retrieval … Fast, flexible Interactive refinement

Sort and filter on metadata Note: Sort/filter automatically triggers query

UI experiments Previews, Top/Side, Previews, Richer visualizations Richer visualizations

Page 8: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS DemoSIS Demo

Page 9: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Evaluating SISEvaluating SIS Internal deployment

~1500 downloads Users include: program management, test, sales,

development, administrative, executives, etc. Research techniques

Free-form feedback Questionnaires; Structured interviews Usage patterns from log data UI experiments (randomly deploy different

versions) Lab studies for richer UI (e.g., timeline, trends)

But even here must work with users’ own content

Page 10: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Top vs. Side Views

Previews vs. Not

Sort By Date vs. Rank

Page 11: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS Usage DataSIS Usage DataDetailed analysis for 234 people, 6 weeks usage Personal store characteristics

5k – 100k items; index <150 meg Query characteristics

Short queries (1.59 words) Few advanced operators or fielded search in query box

(7.5%) Frequent use of query iteration (48%)

50% refined queries involve filters – type, date most common

35% refined queries involve changes to query 13% refined queries involve re-sort

Query content Vs. Spink et al.’s analysis of web queries Importance of people

29% of the queries involve people’s names

Page 12: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS Usage Data, cont’dSIS Usage Data, cont’dCharacteristics of items

opened File types opened

76% Email 14% Web pages 10% Files

Age of items opened 7% today 22% within the last week 46% within the last month

Ease of finding information Easier after SIS for web, email, files Non-SIS search decreases for web,

email, files Files Email Web Pages

0

1

2

3

4

5

6

Pre-usage

Post-usage

0

20

40

60

80

100

120

0 500 1000 1500 2000 2500

Freq

uenc

y

Days Since Item First Seen

Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02

Page 13: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS Usage, cont’dSIS Usage, cont’dUI Usage Small effects of

Top/Side, Previews Sort order

Date by far the most common sort field, even for people who had Okapi Rank as default

Importance of time Few searches for

“best” match; many other criteria …

Num

ber

of Q

uerie

s Is

sued

Date Rank0

500

1000

1500

2000

2500

3000

3500

Starting Default Sort Column

Date

Rank

Page 14: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS Usage, cont’dSIS Usage, cont’dObservations about unified access Metadata quality is variable

Email: rich, pretty clean Web: little, not very useful for retrieval Files: some, but often wrong Human annotation: don’t depend on it …

Need abstractions, e.g., “Useful date” Initially, used ‘date seen’ But …

Appointment, when it happens Email and Web, seen Files, changed

What do people remember about time? Memory landmarks

Page 15: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Timeline w/ SIS, Timeline w/ LandmarksLandmarks

Timeline interfaceTimeline interface Augmented with landmarks as Augmented with landmarks as

pointers into human memorypointers into human memory General: holidays, world eventsGeneral: holidays, world events Personal: important photos, Personal: important photos,

appointmentsappointments Heuristics or Bayesian models to Heuristics or Bayesian models to

identify memorable eventsidentify memorable events

Page 16: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Timeline w/ SIS, Timeline w/ LandmarksLandmarks

Search Results

Distribution of Results Over Time

Memory Landmarks- General (world, calendar)- Personal (appts, photos)<linked by time to results>

Page 17: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Timeline ExperimentSIS, Timeline Experiment

Dates Only Landmarks + Dates0

5

10

15

20

25

30

Sea

rch

Tim

e (s

)

With Landmarks Without Landmarks

Page 18: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Visualizing TrendsSIS, Visualizing Trends

Summarize the results of a searchSummarize the results of a search Abstraction beyond individual resultsAbstraction beyond individual results

Grid-based designGrid-based design Axes represent topic, time, peopleAxes represent topic, time, people Cells encode frequency, recencyCells encode frequency, recency

Supports activities like:Supports activities like: What newsgroups are active (on topic x)?What newsgroups are active (on topic x)? What people are active, authoritative (on topic What people are active, authoritative (on topic

x)? x)? When did I last interact w/ people?When did I last interact w/ people?

Page 19: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Visualizing TrendsSIS, Visualizing Trends

Page 20: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS, Grid vs. List SIS, Grid vs. List ExperimentExperiment

Grid View

List View

Page 21: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Next StepsNext Steps Continue explorations of rich UI Augment index with “usage” data SIS as service, with many entry points

“Contextualize” retrieval Retrieve using Implicit Queries Identify Stuff I Should See Flat-land

Good search makes filing less important Attributes rather than directory locations

Page 22: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

SIS SummarySIS Summary Unified index of stuff you’ve seen

Fast access to full-text and metadata Heterogeneous content: files, email, web, etc. Automatic and immediate update of index

Studied usage with several techniques Ease of finding improves with SIS Importance of people and time Short queries, quick iteration Novel UI to leverage personal memories

New capabilities for personal information management

More info, http://research.microsoft.com/~sdumais

Page 23: IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal Information Retrieval and.

IDM 2003 Workshop

Vannevar Bush’s VisionVannevar Bush’s Vision Consider a future device for individual use, Consider a future device for individual use,

which is a sort of mechanized private file and which is a sort of mechanized private file and library. It needs a name, and, to coin one at library. It needs a name, and, to coin one at random, "memex" will do. random, "memex" will do. A memex is a A memex is a device in which an individual stores all his device in which an individual stores all his books, records, and communications, and books, records, and communications, and which is mechanized so that it may be which is mechanized so that it may be consulted with exceeding speed and consulted with exceeding speed and flexibility. It is an enlarged intimate flexibility. It is an enlarged intimate supplement to his memory.supplement to his memory.

V. Bush (1945). As we may think. V. Bush (1945). As we may think. Atlantic Monthly, 176Atlantic Monthly, 176, July 1945, 101-, July 1945, 101-108.108.