Transcendence: Enabling A Personal View of the Deep Web

58
Enabling a Personal View of the Deep Web Jeffrey P. Bigham Anna C. Cavender, Ryan S. Kaminsky, Craig M. Prince, and Tyler S. Robison University of Washington Computer Science and Engineering

description

Transcendence talk at IUI given by Jeffrey P. Bigham. See http://www.cs.washington.edu/homes/jbigham/

Transcript of Transcendence: Enabling A Personal View of the Deep Web

Page 1: Transcendence:  Enabling A Personal View of the Deep Web

Enabling a Personal View of the Deep Web

Jeffrey P. Bigham

Anna C. Cavender, Ryan S. Kaminsky,Craig M. Prince, and Tyler S. Robison

University of WashingtonComputer Science and Engineering

Page 2: Transcendence:  Enabling A Personal View of the Deep Web

What is the Deep Web?

o The deep webo Built from underlying databaseso Accessible by querying web formso 400-550x larger than surface web1

o The surface webo Accessible by following linkso Indexed by traditional search engines

[1] Bergman, M. K. The deep web: Surfacing hidden value, 2001.

Introduction

Page 3: Transcendence:  Enabling A Personal View of the Deep Web

Deep Web Resources

Introduction

Page 4: Transcendence:  Enabling A Personal View of the Deep Web
Page 5: Transcendence:  Enabling A Personal View of the Deep Web

Deep Web Resources

Introduction

Page 6: Transcendence:  Enabling A Personal View of the Deep Web

Problems

o Web interfaces are inflexible– Your query might not be supported

o Many searches are often required– Multiple queries, multiple tabs/windows

o Aggregate queries are difficult– Data is technically available but hard to access

Introduction

Page 7: Transcendence:  Enabling A Personal View of the Deep Web

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Page 8: Transcendence:  Enabling A Personal View of the Deep Web

Scenario

o Jane is a new student at UW

o Looking for an apartment on Craigslist

o Aware of two neighborhoods in Seattle– “University District” and “University Village”

o Looking for the cheapest apartment near UW

Transcending Craigslist

Page 9: Transcendence:  Enabling A Personal View of the Deep Web
Page 10: Transcendence:  Enabling A Personal View of the Deep Web
Page 11: Transcendence:  Enabling A Personal View of the Deep Web

Generalize a Form Field

Page 12: Transcendence:  Enabling A Personal View of the Deep Web

Add a Value

Page 13: Transcendence:  Enabling A Personal View of the Deep Web

Add a Value

Page 14: Transcendence:  Enabling A Personal View of the Deep Web

Add Another Value

Page 15: Transcendence:  Enabling A Personal View of the Deep Web

Automatically Generate More Values

Page 16: Transcendence:  Enabling A Personal View of the Deep Web

Results only for “University Village”

Page 17: Transcendence:  Enabling A Personal View of the Deep Web

Fields Automatically Chosen

Page 18: Transcendence:  Enabling A Personal View of the Deep Web

Extract for All Inputs

Page 19: Transcendence:  Enabling A Personal View of the Deep Web

Review Extractions in Place

Page 20: Transcendence:  Enabling A Personal View of the Deep Web

Extractions Sorted by Price

Page 21: Transcendence:  Enabling A Personal View of the Deep Web

Transcending Craigslist

o Provided personal view of Craigslist

o Multiple queries/results in single window

o Cheapest neighborhood not originally entered

o Required only a little more than a single search

Page 22: Transcendence:  Enabling A Personal View of the Deep Web

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Page 23: Transcendence:  Enabling A Personal View of the Deep Web

Crawling the Deep Web

o Crawling the Deep Web 1. Find interesting web forms

2. Find appropriate values to provide– Determine schema and appropriate queries1

– Find keywords likely to elicit interesting results2

o Don’t involve users

[1] Madhavan et al. “Structured data meets the web: A few observations.” 2006.[2] Ntoulas et al. “Downloading textual hidden web content through keyword queries.” 2005.

Related Work

Page 24: Transcendence:  Enabling A Personal View of the Deep Web

User Interfaces for the Web

o Collect, Manage and Use Web Information– Sifter1

o Augment sorting/filtering

– Clip, Connect, Clone2

o Manipulate web formso Clone to specify multiple values

– CREO3

o Web macros that generalize using Open Mind Repository

Related Work

[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.[2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004.[3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.

Page 25: Transcendence:  Enabling A Personal View of the Deep Web

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Page 26: Transcendence:  Enabling A Personal View of the Deep Web

1. Generalize Form

o Choose input fields to generalize

o Enter multiple values for those fields– Either enter manually, or– Use subset of values in selection/radio/checkbox

o Optionally add more values automatically– Prior Input of Other Users – Unsupervised Information Extraction

3 Steps of Transcendence

Page 27: Transcendence:  Enabling A Personal View of the Deep Web

Input: phrases Output: (similar) phrases

o Google Sets1

– Up to 10 inputs, returns 15 or 50 results– Based on contextual similarity (probably)

o KnowItAll List Extractor2

– Finds inputs in lists in unstructured web text– Extracts other items in the lists– Proceeds in iterations to potentially find many more

1-a) Finding Values with UIE

[1] http://labs.google.com/sets/[2] Etzioni et al. “Methods for domain-independent information extraction from the

web: an experimental comparison.” 2008

3 Steps of Transcendence

Page 28: Transcendence:  Enabling A Personal View of the Deep Web

2. Choose Fields & Extract

o Submit form with single combination of values

o Result fields identified automatically1

– Identified by XPATH– Pre-processing adds structure

o Users optionally edit fields

o Begin extraction process– Multi-threaded extraction

3 Steps of Transcendence

[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.

Page 29: Transcendence:  Enabling A Personal View of the Deep Web

3. Visualize Data

o In place on the web page– Sort and select results within the web page

o External Visualizers– Histogram, Google Map, line graphs,

scatter plot, and table of values

3 Steps of Transcendence

Page 30: Transcendence:  Enabling A Personal View of the Deep Web

Outline

o Introduction

o Transcending Craigstlist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Page 31: Transcendence:  Enabling A Personal View of the Deep Web

Examples: IMDB1 Rating Dist.

Additional Examples

[1] http://www.imdb.com

Entered:“Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix”

Generate > 7000 more titles

Page 32: Transcendence:  Enabling A Personal View of the Deep Web

Examples: IMDB1 Rating Dist.

Additional Examples

[1] http://www.imdb.com

Page 33: Transcendence:  Enabling A Personal View of the Deep Web

Examples: Directory Diving

o Supplied 3 surnames:– “Allen,” “Smith,” and “Johnson”– Generated 10,063 more names

o A few hours later…– 51,233 unique names and emails– Also address information, position

Additional Examples

Page 34: Transcendence:  Enabling A Personal View of the Deep Web

Outline

o Introduction

o Transcending Craigsist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Page 35: Transcendence:  Enabling A Personal View of the Deep Web

User Evaluation

o 9 Potential Users Evaluated Transcendence– 5 programmers, 4 non-programmers

o 3 Tasks– Search for a flight

o Multiple destinations, departure and return dates

– Map REI Stores in the U.S.– Search Craigslist for an apartment

User Evaluation

Page 36: Transcendence:  Enabling A Personal View of the Deep Web

User Reaction & Commentso Agreed that Transcendence:

“could be used to find useful information”

“is powerful (would allow me to easily accomplish difficult tasks).”

o Most compelling task varied by user– Craigslist suggested by preliminary user– Many related to flight task

o Questioned value of incomplete database reconstructions– Pleasantly surprised by values automatically supplied

o Wanted to use Transcendence in the future

User Evaluation

Page 37: Transcendence:  Enabling A Personal View of the Deep Web

Future Work

o Implicit Resource Descriptions– Eliminate Need to Choose Result Fields

o Share result schemas between userso Eliminate Step 2

– Custom vertical search engineso User-created Kayaks, Metacrawlers, and Froogles

o Improved Deep Web Crawling– Use UIE to find appropriate values for forms

Future Work

Page 38: Transcendence:  Enabling A Personal View of the Deep Web

Conclusion

o Transcendence makes forms more flexible

o Transcendence automatically finds input values

o Unsupervised information extraction useful for crawling

o Transcendence enables new queries not possible before

o Participants wanted to use Transcendence

Conclusion

Page 39: Transcendence:  Enabling A Personal View of the Deep Web

TranscendenceJeffrey P. Bigham

[email protected]/homes/jbigham/

Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants.

The End

Page 40: Transcendence:  Enabling A Personal View of the Deep Web

Some Extra Slides

Page 41: Transcendence:  Enabling A Personal View of the Deep Web

Show Those Resulting from Specific Inputs

Page 42: Transcendence:  Enabling A Personal View of the Deep Web

Show Those Resulting from Specific Inputs

Page 43: Transcendence:  Enabling A Personal View of the Deep Web

Show Wedgewood Results

Page 44: Transcendence:  Enabling A Personal View of the Deep Web
Page 45: Transcendence:  Enabling A Personal View of the Deep Web
Page 46: Transcendence:  Enabling A Personal View of the Deep Web
Page 47: Transcendence:  Enabling A Personal View of the Deep Web

System Description

Generalizers

TranscendenceSystem

TranscendenceSystem

Firefox Extension

The Web

Step 2:Extract

Step 1:Generalize

Step 3: Visualize

Extraction Database

Google Maps

KnowItAllGoogle Sets

Prior Input

Java Applet

Page 48: Transcendence:  Enabling A Personal View of the Deep Web

3 Steps of Transcendence

Generalize Choose Fields&

Extract

Visualize

Page 49: Transcendence:  Enabling A Personal View of the Deep Web

Examples: Mapping Stores

Additional Examples

Page 50: Transcendence:  Enabling A Personal View of the Deep Web

Examples: Kayak Flights

Additional Examples

Page 51: Transcendence:  Enabling A Personal View of the Deep Web

3. I could use Transcendence to find useful information.

1. Transcendence is difficult to learn how to use.

9. Automatic selection of fields is useful.

11. I would use Transcendence in the future if was available.

10. Transcendence would save me time.

8. Generalization of input fields is useful.

7. Transcendence is useful for performing the tasks in this study.

6. Manually recreating Transcendence’s functionality would be time-consuming.

5. Manually recreating Transcendence’s functionality for a specific web site would be difficult.

4. Transcendence is powerful (it could allow me to easily accomplish difficult tasks).

2. Transcendence is tedious to use.

1 7

1 7

1 7

1 7

1 7

1 71 (strongly disagree) to 7 (strongly agree)

Ease of Use

Value1 (strongly disagree) to 7 (strongly agree)

Programmers Non-Programmers Combined

User Evaluation

Page 52: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 53: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 54: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 55: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 56: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 57: Transcendence:  Enabling A Personal View of the Deep Web

Introduction

Page 58: Transcendence:  Enabling A Personal View of the Deep Web

Deep Web Resources

Introduction