Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

20
Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison Efficiently Incorporating User Efficiently Incorporating User Feedback into Information Feedback into Information Extraction and Integration Extraction and Integration Programs Programs

description

Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison. The Need for Incorporating User Feedback. Panels Chair. Current Approach. Code. Data. …. 3. - PowerPoint PPT Presentation

Transcript of Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

Page 1: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Efficiently Incorporating User Feedback Efficiently Incorporating User Feedback into Information Extraction and into Information Extraction and

Integration ProgramsIntegration Programs

Page 2: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

The Need for Incorporating User FeedbackThe Need for Incorporating User Feedback

Panels Chair

Page 3: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

3

Current ApproachCurrent Approach

Code

Data

Page 4: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

4

This Is Not Just For DBLifeThis Is Not Just For DBLife A growing number of applications use IE and II

– Avatar@IBM Almaden– AliBaba@Humboldt Univ. of Berlin– YAGO@MPI– Kylin@Univ. of Washington– …

A systematic user-feedback solution could significantlybenefit them

Page 5: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

5

What User Feedback To Incorporate?What User Feedback To Incorporate?

Types of User Feedback

Flagging an Error Fixing an Error

Editing Data Editing Code

Input IntermediateResults

Output

Page 6: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

6

ChallengesChallenges

How to expose program data for user feedback?

How to incorporate user feedback?

How to efficiently execute a program?

Page 7: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

7

Exposing Program Data for User FeedbackExposing Program Data for User Feedback

dataSources

services Views User Interfaces

extractConf

crawl

extractNames

findRoles

…09/01/2008http://.../cidr09/

dateurl

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

url… Form

Spreadsheet

Wikiname conf role… … …

name role page… … …

roles

Extracting conference services

Page 8: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

8

Writing User-Feedback RulesWriting User-Feedback Rulesto Expose Program Datato Expose Program Data

Write extraction program, e.g., in xlog [Shen et al, 07]

R6: dataSourcesForUserFeedback(url) : dataSources(url, date), date >= “01/01/2009”R7: rolesForUserFeedback(pos, page#no-edit)#spreadsheet-UI : roles(role, page)R8: servicesForUserFeedback(name, conf, role)#wiki-UI : services(name, conf, role)

Write user-feedback rules to specify views and user interfaces#form-UI

R1: pages(page) : dataSources(url, date), crawl(url, page)

R3: names(name, page) : pages(page), extractNames(page, name)

R2: conferences(conf, page): pages(page), extractConf(page, conf)

R5: services(name, conf, role) : conferences(conf, page), roles(name, role, page)

R4: roles(name, role, page) : names(name, page), findRoles(name, page, role)

Page 9: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

9

Program SemanticsProgram SemanticsViews

url…

name conf role… … …

name role page… … …extractConf

crawl

extractNames

findRoles

dataSources…

09/01/2008http://.../cidr09/dateurl

services

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

roles

User Interfaces

Form

Spreadsheet

Wiki

Page 10: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

10

Incorporating Previous User FeedbackIncorporating Previous User Feedback

I

O t t’

p

Interpretation: for operator p, if t is in the output, change t into t’

nameA. Smith

A. Jones

pagep1

… D.Smith, A. Jones, ...

nameA. Smith

pagep2

Dr. A. Smith is ...… …

Change “A. Smith” to “D. Smith”

extractNames extractNames

O’

I

O

p

Page 11: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

11

Interpreting User Feedback Based On Interpreting User Feedback Based On Tuple ProvenanceTuple Provenance

Provenance of output tuple t :– the set of input tuples that operator p used to produce t

nameA. SmithA. Jones

pagep1

extractNames

p1p1

Change “A. Smith” to “D. Smith”

If the operator produces {“A. Smith”, “A. Jones”} from {p1},

then replace {“A. Smith”, “A. Jones”} with {“D. Smith”, “A. Jones”}

p1p2

page

extractNames

p1p1p2

nameA. SmithA. JonesA. Smith

Page 12: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

12

ChallengesChallenges

How to expose program data for user feedback?

How to incorporate user feedback?

How to efficiently execute a program?– Incremental execution– Improved concurrency control

Page 13: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

13

Incrementally Executing the Program Incrementally Executing the Program

?

p2p1

page

…name

extractNames

p2p1

page

extractNames

p3

Similar problem in incremental view maintenance Incremental-update properties

– Closed-formed insertion– Closed-formed deletion– Input partitionability– Partition correlation– Attribute independence

extractNames(I+I)

extractNames(I)=

extractNames(I)+

Page 14: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

14

Concurrently Executing Transactions Concurrently Executing Transactions

dataSources

services

extractConf

crawl

extractNames

findRoles

…09/01/2008http://.../cidr09/

dateurl

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

rolesT2

T1 Locks only the input and output tables of the crawl operator

Table-Locking

Skips executing the join operator after updating the roles table

Operator-Skipping

Page 15: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

15

Experiment SetupExperiment Setup Testbed

– A 5-stage DBLife workflow– 13 blackbox operators: 6 IE operators and 3 II operators

Wrote xlog program and user-feedback rules in < 1 hr

Simulated user-feedback transactions– On each stage of the workflow– Each transaction randomly deletes, inserts, or modifies

1/10 of the tuples in a table

Page 16: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

16

Incremental-Update Properties are Incremental-Update Properties are Broadly ApplicableBroadly Applicable

Inc. Update Properties DBLife Operators ci cd ip ai pc

Get Data Pages Get People Variations

Get Publication Variations Get Organization Variations

Find People Variations Find Publication Variations

Find Organization Variations Find People Entities

Find Publication Entities Find Organization Entities

Find Related People Find Authorship

Find Related Organizations

Page 17: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

17

Incremental Update Incremental Update Reduces Execution TimeReduces Execution Time

Page 18: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

18

Table-Locking and Operator-Skipping Table-Locking and Operator-Skipping Improve Concurrency DegreeImprove Concurrency Degree

Increase transaction throughput by 50% and 500%

Reduce transaction response time by 43% and 98%

Min Max Average Graph-locking ~0s 7,584s 3,203s Table-locking 1s 5,485s 1,841s Operator-skipping ~0s 457s 43s

-43%-98%

Page 19: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

19

Related WorkRelated Work User feedback in IE and II

– [Doan et al, 01], [Chiticariu et al, 08], [Jeffery et al, 08]– Leveraging user feedback to improve results of individual operations

Provenance– [Woodruff & Stonebraker, 97], [Cui & Widom, 01], [Buneman et al, 01],

[Bohannon et al, 08] ], [Huang et al, 08]

Incremental execution– View maintenance [Blakeley et al, 86], [Griffin & Libkin, 95], [Gupta &

Mumick, 95] – Schema matching [Bernstein et al, 06], IE [Chen et al, 07]

Page 20: Efficiently Incorporating User Feedback into Information Extraction and Integration Programs

20

Conclusions and Future WorkConclusions and Future Work

Incorporating user feedback into IE and II programsis important

Identify key issues and provide initial solutions:– Write user-feedback rules to expose program data to UIs– Model and incorporate user feedback– Efficiently execute program to process user feedback

Future work:– Handle unreliable user feedback– Propagate user feedback down in the workflow– Conduct user study