Transcendence: Enabling A Personal View of the Deep Web

Post on 08-May-2015

2.965 views 3 download

description

Transcendence talk at IUI given by Jeffrey P. Bigham. See http://www.cs.washington.edu/homes/jbigham/

Transcript of Transcendence: Enabling A Personal View of the Deep Web

Enabling a Personal View of the Deep Web

Jeffrey P. Bigham

Anna C. Cavender, Ryan S. Kaminsky,Craig M. Prince, and Tyler S. Robison

University of WashingtonComputer Science and Engineering

What is the Deep Web?

o The deep webo Built from underlying databaseso Accessible by querying web formso 400-550x larger than surface web1

o The surface webo Accessible by following linkso Indexed by traditional search engines

[1] Bergman, M. K. The deep web: Surfacing hidden value, 2001.

Introduction

Deep Web Resources

Introduction

Deep Web Resources

Introduction

Problems

o Web interfaces are inflexible– Your query might not be supported

o Many searches are often required– Multiple queries, multiple tabs/windows

o Aggregate queries are difficult– Data is technically available but hard to access

Introduction

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Scenario

o Jane is a new student at UW

o Looking for an apartment on Craigslist

o Aware of two neighborhoods in Seattle– “University District” and “University Village”

o Looking for the cheapest apartment near UW

Transcending Craigslist

Generalize a Form Field

Add a Value

Add a Value

Add Another Value

Automatically Generate More Values

Results only for “University Village”

Fields Automatically Chosen

Extract for All Inputs

Review Extractions in Place

Extractions Sorted by Price

Transcending Craigslist

o Provided personal view of Craigslist

o Multiple queries/results in single window

o Cheapest neighborhood not originally entered

o Required only a little more than a single search

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Crawling the Deep Web

o Crawling the Deep Web 1. Find interesting web forms

2. Find appropriate values to provide– Determine schema and appropriate queries1

– Find keywords likely to elicit interesting results2

o Don’t involve users

[1] Madhavan et al. “Structured data meets the web: A few observations.” 2006.[2] Ntoulas et al. “Downloading textual hidden web content through keyword queries.” 2005.

Related Work

User Interfaces for the Web

o Collect, Manage and Use Web Information– Sifter1

o Augment sorting/filtering

– Clip, Connect, Clone2

o Manipulate web formso Clone to specify multiple values

– CREO3

o Web macros that generalize using Open Mind Repository

Related Work

[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.[2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004.[3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.

Outline

o Introduction

o Transcending Craigslist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

1. Generalize Form

o Choose input fields to generalize

o Enter multiple values for those fields– Either enter manually, or– Use subset of values in selection/radio/checkbox

o Optionally add more values automatically– Prior Input of Other Users – Unsupervised Information Extraction

3 Steps of Transcendence

Input: phrases Output: (similar) phrases

o Google Sets1

– Up to 10 inputs, returns 15 or 50 results– Based on contextual similarity (probably)

o KnowItAll List Extractor2

– Finds inputs in lists in unstructured web text– Extracts other items in the lists– Proceeds in iterations to potentially find many more

1-a) Finding Values with UIE

[1] http://labs.google.com/sets/[2] Etzioni et al. “Methods for domain-independent information extraction from the

web: an experimental comparison.” 2008

3 Steps of Transcendence

2. Choose Fields & Extract

o Submit form with single combination of values

o Result fields identified automatically1

– Identified by XPATH– Pre-processing adds structure

o Users optionally edit fields

o Begin extraction process– Multi-threaded extraction

3 Steps of Transcendence

[1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006.

3. Visualize Data

o In place on the web page– Sort and select results within the web page

o External Visualizers– Histogram, Google Map, line graphs,

scatter plot, and table of values

3 Steps of Transcendence

Outline

o Introduction

o Transcending Craigstlist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

Examples: IMDB1 Rating Dist.

Additional Examples

[1] http://www.imdb.com

Entered:“Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix”

Generate > 7000 more titles

Examples: IMDB1 Rating Dist.

Additional Examples

[1] http://www.imdb.com

Examples: Directory Diving

o Supplied 3 surnames:– “Allen,” “Smith,” and “Johnson”– Generated 10,063 more names

o A few hours later…– 51,233 unique names and emails– Also address information, position

Additional Examples

Outline

o Introduction

o Transcending Craigsist

o Related Work

o 3 Steps of Transcendence

o Additional Examples

o User Evaluation

User Evaluation

o 9 Potential Users Evaluated Transcendence– 5 programmers, 4 non-programmers

o 3 Tasks– Search for a flight

o Multiple destinations, departure and return dates

– Map REI Stores in the U.S.– Search Craigslist for an apartment

User Evaluation

User Reaction & Commentso Agreed that Transcendence:

“could be used to find useful information”

“is powerful (would allow me to easily accomplish difficult tasks).”

o Most compelling task varied by user– Craigslist suggested by preliminary user– Many related to flight task

o Questioned value of incomplete database reconstructions– Pleasantly surprised by values automatically supplied

o Wanted to use Transcendence in the future

User Evaluation

Future Work

o Implicit Resource Descriptions– Eliminate Need to Choose Result Fields

o Share result schemas between userso Eliminate Step 2

– Custom vertical search engineso User-created Kayaks, Metacrawlers, and Froogles

o Improved Deep Web Crawling– Use UIE to find appropriate values for forms

Future Work

Conclusion

o Transcendence makes forms more flexible

o Transcendence automatically finds input values

o Unsupervised information extraction useful for crawling

o Transcendence enables new queries not possible before

o Participants wanted to use Transcendence

Conclusion

TranscendenceJeffrey P. Bigham

jbigham@cs.washington.eduwww.cs.washington.edu/homes/jbigham/

Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants.

The End

Some Extra Slides

Show Those Resulting from Specific Inputs

Show Those Resulting from Specific Inputs

Show Wedgewood Results

System Description

Generalizers

TranscendenceSystem

TranscendenceSystem

Firefox Extension

The Web

Step 2:Extract

Step 1:Generalize

Step 3: Visualize

Extraction Database

Google Maps

KnowItAllGoogle Sets

Prior Input

Java Applet

3 Steps of Transcendence

Generalize Choose Fields&

Extract

Visualize

Examples: Mapping Stores

Additional Examples

Examples: Kayak Flights

Additional Examples

3. I could use Transcendence to find useful information.

1. Transcendence is difficult to learn how to use.

9. Automatic selection of fields is useful.

11. I would use Transcendence in the future if was available.

10. Transcendence would save me time.

8. Generalization of input fields is useful.

7. Transcendence is useful for performing the tasks in this study.

6. Manually recreating Transcendence’s functionality would be time-consuming.

5. Manually recreating Transcendence’s functionality for a specific web site would be difficult.

4. Transcendence is powerful (it could allow me to easily accomplish difficult tasks).

2. Transcendence is tedious to use.

1 7

1 7

1 7

1 7

1 7

1 71 (strongly disagree) to 7 (strongly agree)

Ease of Use

Value1 (strongly disagree) to 7 (strongly agree)

Programmers Non-Programmers Combined

User Evaluation

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Deep Web Resources

Introduction