IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal...
-
Upload
ambrose-carter -
Category
Documents
-
view
214 -
download
0
Transcript of IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research sdumais A System for Personal...
IDM 2003 Workshop
Stuff I’ve Seen:Stuff I’ve Seen:
Susan DumaisMicrosoft Research
http://research.microsoft.com/~sdumais
A System for Personal A System for Personal Information Retrieval and Information Retrieval and
Re-UseRe-Use
IDM 2003 Workshop
OutlineOutline
Search today Search with Stuff I’ve Seen (SIS)
With: Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, Daniel Robbins
Experiences with SIS Deployment Usage data UI innovations
Next steps for SIS
IDM 2003 Workshop
Search Today …Search Today …
Many locations, interfaces for
finding things (e.g., web, mail, local files, help, history, notes)
Often slow
“… the No.1 question we're trying to solve [in Longhorn] is ‘Where's my stuff?’ Right now, file space on any PC is a cesspool. “
Bill Gates, FORTUNE interview, June 23, 2002
IDM 2003 Workshop
Search With SISSearch With SIS Unified index of stuff you’ve seen
All types of information, e.g., files of all types, email, calendar, contacts, web pages, etc.
Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)
Automatic and immediate update of index Rich UI possibilities, since it’s your content
Get back to information you’ve seen Re-use vs. initial discovery
IDM 2003 Workshop
Related WorkRelated Work Several systems for improving access for specific
sources (e.g., web, mail, files, photos, music) Some integration across sources
KFTF [Jones et al., 2002] Lifestreams/Scopeware [Fertig, Freeman, Gelernter, 1996] MyLife Bits [Gemmell et al., 2002] Haystack [Adar et al., 1999; Huynh et al. 2002] Commercial products
OS: Mac Sherlock, Windows Indexing Service Apps: Enfish, 80-20 retriever, dtSearch, X1, etc.
What’s new with SIS … Full content and metadata for many different sources Extensible architecture Usage experiences and experimental data UI focus
IDM 2003 Workshop
SIS ArchitectureSIS Architecture Indexing infrastructure uses MS Search
components (note: IR platform) Gatherer – interface to content sources, e.g.,
files, http, MAPI Filters – decode different file types, e.g., word,
powerpoint, html, pdf, journal Tokenizer – break into words, including date
normalization, stemming, etc. Indexer – standard inverted index Retriever – Boolean, best match (Okapi)
User interface Client side indexing and storage
IDM 2003 Workshop
SIS Design PrinciplesSIS Design Principles Indexing …
No additional work is required User sees something, and it gets indexed
Retrieval … Fast, flexible Interactive refinement
Sort and filter on metadata Note: Sort/filter automatically triggers query
UI experiments Previews, Top/Side, Previews, Richer visualizations Richer visualizations
IDM 2003 Workshop
SIS DemoSIS Demo
IDM 2003 Workshop
Evaluating SISEvaluating SIS Internal deployment
~1500 downloads Users include: program management, test, sales,
development, administrative, executives, etc. Research techniques
Free-form feedback Questionnaires; Structured interviews Usage patterns from log data UI experiments (randomly deploy different
versions) Lab studies for richer UI (e.g., timeline, trends)
But even here must work with users’ own content
IDM 2003 Workshop
Top vs. Side Views
Previews vs. Not
Sort By Date vs. Rank
IDM 2003 Workshop
SIS Usage DataSIS Usage DataDetailed analysis for 234 people, 6 weeks usage Personal store characteristics
5k – 100k items; index <150 meg Query characteristics
Short queries (1.59 words) Few advanced operators or fielded search in query box
(7.5%) Frequent use of query iteration (48%)
50% refined queries involve filters – type, date most common
35% refined queries involve changes to query 13% refined queries involve re-sort
Query content Vs. Spink et al.’s analysis of web queries Importance of people
29% of the queries involve people’s names
IDM 2003 Workshop
SIS Usage Data, cont’dSIS Usage Data, cont’dCharacteristics of items
opened File types opened
76% Email 14% Web pages 10% Files
Age of items opened 7% today 22% within the last week 46% within the last month
Ease of finding information Easier after SIS for web, email, files Non-SIS search decreases for web,
email, files Files Email Web Pages
0
1
2
3
4
5
6
Pre-usage
Post-usage
0
20
40
60
80
100
120
0 500 1000 1500 2000 2500
Freq
uenc
y
Days Since Item First Seen
Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02
IDM 2003 Workshop
SIS Usage, cont’dSIS Usage, cont’dUI Usage Small effects of
Top/Side, Previews Sort order
Date by far the most common sort field, even for people who had Okapi Rank as default
Importance of time Few searches for
“best” match; many other criteria …
Num
ber
of Q
uerie
s Is
sued
Date Rank0
500
1000
1500
2000
2500
3000
3500
Starting Default Sort Column
Date
Rank
IDM 2003 Workshop
SIS Usage, cont’dSIS Usage, cont’dObservations about unified access Metadata quality is variable
Email: rich, pretty clean Web: little, not very useful for retrieval Files: some, but often wrong Human annotation: don’t depend on it …
Need abstractions, e.g., “Useful date” Initially, used ‘date seen’ But …
Appointment, when it happens Email and Web, seen Files, changed
What do people remember about time? Memory landmarks
IDM 2003 Workshop
SIS, Timeline w/ SIS, Timeline w/ LandmarksLandmarks
Timeline interfaceTimeline interface Augmented with landmarks as Augmented with landmarks as
pointers into human memorypointers into human memory General: holidays, world eventsGeneral: holidays, world events Personal: important photos, Personal: important photos,
appointmentsappointments Heuristics or Bayesian models to Heuristics or Bayesian models to
identify memorable eventsidentify memorable events
IDM 2003 Workshop
SIS, Timeline w/ SIS, Timeline w/ LandmarksLandmarks
Search Results
Distribution of Results Over Time
Memory Landmarks- General (world, calendar)- Personal (appts, photos)<linked by time to results>
IDM 2003 Workshop
SIS, Timeline ExperimentSIS, Timeline Experiment
Dates Only Landmarks + Dates0
5
10
15
20
25
30
Sea
rch
Tim
e (s
)
With Landmarks Without Landmarks
IDM 2003 Workshop
SIS, Visualizing TrendsSIS, Visualizing Trends
Summarize the results of a searchSummarize the results of a search Abstraction beyond individual resultsAbstraction beyond individual results
Grid-based designGrid-based design Axes represent topic, time, peopleAxes represent topic, time, people Cells encode frequency, recencyCells encode frequency, recency
Supports activities like:Supports activities like: What newsgroups are active (on topic x)?What newsgroups are active (on topic x)? What people are active, authoritative (on topic What people are active, authoritative (on topic
x)? x)? When did I last interact w/ people?When did I last interact w/ people?
IDM 2003 Workshop
SIS, Visualizing TrendsSIS, Visualizing Trends
IDM 2003 Workshop
SIS, Grid vs. List SIS, Grid vs. List ExperimentExperiment
Grid View
List View
IDM 2003 Workshop
Next StepsNext Steps Continue explorations of rich UI Augment index with “usage” data SIS as service, with many entry points
“Contextualize” retrieval Retrieve using Implicit Queries Identify Stuff I Should See Flat-land
Good search makes filing less important Attributes rather than directory locations
IDM 2003 Workshop
SIS SummarySIS Summary Unified index of stuff you’ve seen
Fast access to full-text and metadata Heterogeneous content: files, email, web, etc. Automatic and immediate update of index
Studied usage with several techniques Ease of finding improves with SIS Importance of people and time Short queries, quick iteration Novel UI to leverage personal memories
New capabilities for personal information management
More info, http://research.microsoft.com/~sdumais
IDM 2003 Workshop
Vannevar Bush’s VisionVannevar Bush’s Vision Consider a future device for individual use, Consider a future device for individual use,
which is a sort of mechanized private file and which is a sort of mechanized private file and library. It needs a name, and, to coin one at library. It needs a name, and, to coin one at random, "memex" will do. random, "memex" will do. A memex is a A memex is a device in which an individual stores all his device in which an individual stores all his books, records, and communications, and books, records, and communications, and which is mechanized so that it may be which is mechanized so that it may be consulted with exceeding speed and consulted with exceeding speed and flexibility. It is an enlarged intimate flexibility. It is an enlarged intimate supplement to his memory.supplement to his memory.
V. Bush (1945). As we may think. V. Bush (1945). As we may think. Atlantic Monthly, 176Atlantic Monthly, 176, July 1945, 101-, July 1945, 101-108.108.