Transcendence: Enabling A Personal View of the Deep Web
-
Upload
jeffrey-bigham -
Category
Technology
-
view
2.964 -
download
3
description
Transcript of Transcendence: Enabling A Personal View of the Deep Web
Enabling a Personal View of the Deep Web
Jeffrey P. Bigham
Anna C. Cavender, Ryan S. Kaminsky,Craig M. Prince, and Tyler S. Robison
University of WashingtonComputer Science and Engineering
What is the Deep Web?
o The deep webo Built from underlying databaseso Accessible by querying web formso 400-550x larger than surface web1
o The surface webo Accessible by following linkso Indexed by traditional search engines
[1] Bergman, M. K. The deep web: Surfacing hidden value, 2001.
Introduction
Deep Web Resources
Introduction
Deep Web Resources
Introduction
Problems
o Web interfaces are inflexible– Your query might not be supported
o Many searches are often required– Multiple queries, multiple tabs/windows
o Aggregate queries are difficult– Data is technically available but hard to access
Introduction
Outline
o Introduction
o Transcending Craigslist
o Related Work
o 3 Steps of Transcendence
o Additional Examples
o User Evaluation
Scenario
o Jane is a new student at UW
o Looking for an apartment on Craigslist
o Aware of two neighborhoods in Seattle– “University District” and “University Village”
o Looking for the cheapest apartment near UW
Transcending Craigslist
Generalize a Form Field
Add a Value
Add a Value
Add Another Value
Automatically Generate More Values
Results only for “University Village”
Fields Automatically Chosen
Extract for All Inputs
Review Extractions in Place
Extractions Sorted by Price
Transcending Craigslist
o Provided personal view of Craigslist
o Multiple queries/results in single window
o Cheapest neighborhood not originally entered
o Required only a little more than a single search
Outline
o Introduction
o Transcending Craigslist
o Related Work
o 3 Steps of Transcendence
o Additional Examples
o User Evaluation
Crawling the Deep Web
o Crawling the Deep Web 1. Find interesting web forms
2. Find appropriate values to provide– Determine schema and appropriate queries1
– Find keywords likely to elicit interesting results2
o Don’t involve users
[1] Madhavan et al. “Structured data meets the web: A few observations.” 2006.[2] Ntoulas et al. “Downloading textual hidden web content through keyword queries.” 2005.
Related Work
User Interfaces for the Web
o Collect, Manage and Use Web Information– Sifter1
o Augment sorting/filtering
– Clip, Connect, Clone2
o Manipulate web formso Clone to specify multiple values
– CREO3
o Web macros that generalize using Open Mind Repository
Related Work
[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.[2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004.[3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.
Outline
o Introduction
o Transcending Craigslist
o Related Work
o 3 Steps of Transcendence
o Additional Examples
o User Evaluation
1. Generalize Form
o Choose input fields to generalize
o Enter multiple values for those fields– Either enter manually, or– Use subset of values in selection/radio/checkbox
o Optionally add more values automatically– Prior Input of Other Users – Unsupervised Information Extraction
3 Steps of Transcendence
Input: phrases Output: (similar) phrases
o Google Sets1
– Up to 10 inputs, returns 15 or 50 results– Based on contextual similarity (probably)
o KnowItAll List Extractor2
– Finds inputs in lists in unstructured web text– Extracts other items in the lists– Proceeds in iterations to potentially find many more
1-a) Finding Values with UIE
[1] http://labs.google.com/sets/[2] Etzioni et al. “Methods for domain-independent information extraction from the
web: an experimental comparison.” 2008
3 Steps of Transcendence
2. Choose Fields & Extract
o Submit form with single combination of values
o Result fields identified automatically1
– Identified by XPATH– Pre-processing adds structure
o Users optionally edit fields
o Begin extraction process– Multi-threaded extraction
3 Steps of Transcendence
[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.
3. Visualize Data
o In place on the web page– Sort and select results within the web page
o External Visualizers– Histogram, Google Map, line graphs,
scatter plot, and table of values
3 Steps of Transcendence
Outline
o Introduction
o Transcending Craigstlist
o Related Work
o 3 Steps of Transcendence
o Additional Examples
o User Evaluation
Examples: IMDB1 Rating Dist.
Additional Examples
[1] http://www.imdb.com
Entered:“Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix”
Generate > 7000 more titles
Examples: IMDB1 Rating Dist.
Additional Examples
[1] http://www.imdb.com
Examples: Directory Diving
o Supplied 3 surnames:– “Allen,” “Smith,” and “Johnson”– Generated 10,063 more names
o A few hours later…– 51,233 unique names and emails– Also address information, position
Additional Examples
Outline
o Introduction
o Transcending Craigsist
o Related Work
o 3 Steps of Transcendence
o Additional Examples
o User Evaluation
User Evaluation
o 9 Potential Users Evaluated Transcendence– 5 programmers, 4 non-programmers
o 3 Tasks– Search for a flight
o Multiple destinations, departure and return dates
– Map REI Stores in the U.S.– Search Craigslist for an apartment
User Evaluation
User Reaction & Commentso Agreed that Transcendence:
“could be used to find useful information”
“is powerful (would allow me to easily accomplish difficult tasks).”
o Most compelling task varied by user– Craigslist suggested by preliminary user– Many related to flight task
o Questioned value of incomplete database reconstructions– Pleasantly surprised by values automatically supplied
o Wanted to use Transcendence in the future
User Evaluation
Future Work
o Implicit Resource Descriptions– Eliminate Need to Choose Result Fields
o Share result schemas between userso Eliminate Step 2
– Custom vertical search engineso User-created Kayaks, Metacrawlers, and Froogles
o Improved Deep Web Crawling– Use UIE to find appropriate values for forms
Future Work
Conclusion
o Transcendence makes forms more flexible
o Transcendence automatically finds input values
o Unsupervised information extraction useful for crawling
o Transcendence enables new queries not possible before
o Participants wanted to use Transcendence
Conclusion
TranscendenceJeffrey P. Bigham
[email protected]/homes/jbigham/
Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants.
The End
Some Extra Slides
Show Those Resulting from Specific Inputs
Show Those Resulting from Specific Inputs
Show Wedgewood Results
System Description
Generalizers
TranscendenceSystem
TranscendenceSystem
Firefox Extension
The Web
Step 2:Extract
Step 1:Generalize
Step 3: Visualize
Extraction Database
Google Maps
KnowItAllGoogle Sets
Prior Input
Java Applet
3 Steps of Transcendence
Generalize Choose Fields&
Extract
Visualize
Examples: Mapping Stores
Additional Examples
Examples: Kayak Flights
Additional Examples
3. I could use Transcendence to find useful information.
1. Transcendence is difficult to learn how to use.
9. Automatic selection of fields is useful.
11. I would use Transcendence in the future if was available.
10. Transcendence would save me time.
8. Generalization of input fields is useful.
7. Transcendence is useful for performing the tasks in this study.
6. Manually recreating Transcendence’s functionality would be time-consuming.
5. Manually recreating Transcendence’s functionality for a specific web site would be difficult.
4. Transcendence is powerful (it could allow me to easily accomplish difficult tasks).
2. Transcendence is tedious to use.
1 7
1 7
1 7
1 7
1 7
1 71 (strongly disagree) to 7 (strongly agree)
Ease of Use
Value1 (strongly disagree) to 7 (strongly agree)
Programmers Non-Programmers Combined
User Evaluation
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Deep Web Resources
Introduction