Google Tech Talk: Reconsidering Relevance
-
Upload
daniel-tunkelang -
Category
Documents
-
view
165 -
download
3
description
Transcript of Google Tech Talk: Reconsidering Relevance
![Page 1: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/1.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.
Reconsidering Relevance
Daniel TunkelangChief Scientist, Endeca
![Page 2: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/2.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.2
howdy!
• 1988 – 1992
• 1993 – 1998
• 1999 -
![Page 3: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/3.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.3
overview
what is relevance?
what’s wrong with relevance?
what are the alternatives?
![Page 4: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/4.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.4
but first let’s set the stage
![Page 5: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/5.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.5
iconic businesses of the 20th and 21st centuries
I’m Feeling Lucky
![Page 6: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/6.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.6
process and scale orchestration
![Page 7: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/7.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.7
but there’s a dark side
![Page 8: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/8.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.8
users are satisfied
![Page 9: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/9.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.9
an interesting contrast
“Search on the internet is solved. I always find what I need.
But why not in the enterprise?
Seems like a solution waiting to happen.”
- a Fortune 500 CTO
![Page 10: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/10.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.10
the real questions
• What is “search on the internet” and why is it perceived a solved problem?
• What is “search in the enterprise” and why is it perceived as an unsolved problem?
• And what does this have to do with relevance?
![Page 11: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/11.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.11
easy vs. hard search problems
• easywhere to buy Ender in Exile?
• hardgood novel to read on the beach?
• easyproof that sorting has n log n lower bound?
• hardalgorithm to sort partially ordered set, given a constant-time comparator?
![Page 12: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/12.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.12
what is relevance?
what’s wrong with relevance?
what are the alternatives?
![Page 13: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/13.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.13
defining relevance
Relevance is defined as a measure of information conveyed by a document relative to a query.
It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance.
William Goffman, On relevance as a measure, 1964.
![Page 14: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/14.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.14
we need more definitions
![Page 15: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/15.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.15
let’s work top-down
• information retrieval (IR) =
study of retrieval of information (not data) from collection of written documents
retrieved documents aim at satisfying user information need
![Page 16: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/16.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.16
IR assumes information needs
• user information need =
natural language declaration of informational need of user
• query =
expression of user information need in input language provided by information system
![Page 17: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/17.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.17
relevance drives IR modeling
• modeling =
studies algorithms used for ranking documents according to system assigned likelihood of relevance
• model =
a set of premises and an algorithm for ranking documents with regard to a user query
![Page 18: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/18.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.18
a relevance-centric approach
information Need query select from results
rank using IR model
USER:
SYSTEM:tf-idf PageRank
![Page 19: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/19.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.19
what is relevance?
what’s wrong with relevance?
what are the alternatives?
![Page 20: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/20.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.20
our first communication problem
information need query
• 2 words?• natural language?• telepathy?
![Page 21: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/21.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.21
and the game of telephone continues
query rank using IR model
• cumulative error• relevance is subjective• what Goffman said
![Page 22: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/22.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.22
and hopefully users feel lucky
rank using IR model
• selection bias• inefficient channel• backup plan?
select from results
![Page 23: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/23.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.23
queries are misinterpreted
Results 1-10 out of about 344,000,000 for ir
![Page 24: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/24.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.24
ranked lists are inefficient
![Page 25: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/25.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.25
assumptions of relevance-centric approach
• self-awareness
• self-expression
• model knows best
• answer is a document
• one-shot query
![Page 26: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/26.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.26
can we do better?
![Page 27: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/27.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.27
what is relevance?
what’s wrong with relevance?
what are the alternatives?
![Page 28: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/28.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.28
human-computer information retrieval
• don’t just guess the user’s intent– optimize communication
• increase user responsibility and control– require and reward human intellectual effort
“Toward Human-Computer Information Retrieval”
Gary Marchionini
![Page 29: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/29.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.29
human computer information retrieval
![Page 30: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/30.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.30
a concrete use case
• Colleague:
Hey Daniel! You should check out what this guy Steve Pollitt’s been researching. Sounds right up your alley.
• Daniel:
Sure thing, I’ll look into it.
![Page 31: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/31.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.31
google him!
![Page 32: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/32.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.32
google scholar him?
![Page 33: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/33.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.33
rexa him?
![Page 34: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/34.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.34
getting better
![Page 35: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/35.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.35
hcir-inspired interface
![Page 36: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/36.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.36
tags provide summarization and guidance
![Page 37: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/37.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.37
my information need evolves as i learn
![Page 38: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/38.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.38
hcir – implementing the vision
![Page 39: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/39.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.39
scatter/gather: a search for “star”
![Page 40: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/40.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.40
faceted search
![Page 41: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/41.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.41
practical considerations
• which facets to show
• which facet values to show
• when to suggest faceted refinement
• how to automate faceted classification
![Page 42: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/42.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.42
showing the right facets: microwaves
![Page 43: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/43.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.43
showing the right facets: ceiling fans
![Page 44: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/44.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.44
query-driven clarification before refinement
Matching Categories include:
Appliances > Small Appliances > Irons & Steamers
Appliances > Small Appliances > Microwaves & Steamers
Bath > Sauna & Spas > Steamers
Kitchen > Bakeware & Cookware > Cookware >Open Stock Pots > Double Boilers & Steamers
Kitchen > Small Appliances > Steamers
![Page 45: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/45.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.45
results-driven clarification before refinement
Search: storage
![Page 46: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/46.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.46
crowd-sourcing to tag documents
![Page 47: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/47.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.47
recall
precision
hcir cheats the precision / recall trade-off
![Page 48: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/48.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.48
set retrieval 2.0
• set retrieval that responds to queries with– overview of the user's current context– organized set of options for exploration
• contextual summaries of document sets– optimize system’s communication with user
• query refinement options– optimize user’s communication with system
![Page 49: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/49.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.49
hcir using set retrieval 2.0
emphasize set summaries over ranked lists
establish a dialog between the user and the data
enable exploration and discovery
![Page 50: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/50.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.50
think outside the (search) box
• relevance-centric search solves many use cases
• but not some of the most valuable ones
• support interaction, exploration
• human-computer information retrieval
![Page 51: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/51.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.51
one more thing
…
![Page 52: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/52.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.52
“Google's mission is to organize the
world's information and make it
universally accessible and useful.”
![Page 53: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/53.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.53
organizer or referee?
![Page 54: Google Tech Talk: Reconsidering Relevance](https://reader034.fdocuments.us/reader034/viewer/2022051515/55284b484979592e048b46f5/html5/thumbnails/54.jpg)
© 2009 Endeca Technologies, Inc. All rights reserved.54
thank you
communication 1.0email: [email protected]
communication 2.0blog: http://thenoisychannel.com
twitter: http://twitter.com/dtunkelang