Efficiently Incorporating User Feedback into Information Extraction and Integration Programs
description
Transcript of Efficiently Incorporating User Feedback into Information Extraction and Integration Programs
Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. NaughtonUniversity of Wisconsin-Madison
Efficiently Incorporating User Feedback Efficiently Incorporating User Feedback into Information Extraction and into Information Extraction and
Integration ProgramsIntegration Programs
The Need for Incorporating User FeedbackThe Need for Incorporating User Feedback
Panels Chair
3
Current ApproachCurrent Approach
…
Code
Data
4
This Is Not Just For DBLifeThis Is Not Just For DBLife A growing number of applications use IE and II
– Avatar@IBM Almaden– AliBaba@Humboldt Univ. of Berlin– YAGO@MPI– Kylin@Univ. of Washington– …
A systematic user-feedback solution could significantlybenefit them
5
What User Feedback To Incorporate?What User Feedback To Incorporate?
Types of User Feedback
Flagging an Error Fixing an Error
Editing Data Editing Code
Input IntermediateResults
Output
6
ChallengesChallenges
How to expose program data for user feedback?
How to incorporate user feedback?
How to efficiently execute a program?
7
Exposing Program Data for User FeedbackExposing Program Data for User Feedback
dataSources
services Views User Interfaces
extractConf
crawl
extractNames
findRoles
…09/01/2008http://.../cidr09/
dateurl
…
Joe Hellersteinname
PC ChairCIDR 2009roleconf
… … …
name pagerole… … …
url… Form
Spreadsheet
Wikiname conf role… … …
name role page… … …
roles
Extracting conference services
8
Writing User-Feedback RulesWriting User-Feedback Rulesto Expose Program Datato Expose Program Data
Write extraction program, e.g., in xlog [Shen et al, 07]
R6: dataSourcesForUserFeedback(url) : dataSources(url, date), date >= “01/01/2009”R7: rolesForUserFeedback(pos, page#no-edit)#spreadsheet-UI : roles(role, page)R8: servicesForUserFeedback(name, conf, role)#wiki-UI : services(name, conf, role)
Write user-feedback rules to specify views and user interfaces#form-UI
R1: pages(page) : dataSources(url, date), crawl(url, page)
R3: names(name, page) : pages(page), extractNames(page, name)
R2: conferences(conf, page): pages(page), extractConf(page, conf)
R5: services(name, conf, role) : conferences(conf, page), roles(name, role, page)
R4: roles(name, role, page) : names(name, page), findRoles(name, page, role)
9
Program SemanticsProgram SemanticsViews
url…
name conf role… … …
name role page… … …extractConf
crawl
extractNames
findRoles
dataSources…
09/01/2008http://.../cidr09/dateurl
…
services
Joe Hellersteinname
PC ChairCIDR 2009roleconf
… … …
name pagerole… … …
roles
User Interfaces
Form
Spreadsheet
Wiki
10
Incorporating Previous User FeedbackIncorporating Previous User Feedback
I
O t t’
p
Interpretation: for operator p, if t is in the output, change t into t’
nameA. Smith
A. Jones
pagep1
… D.Smith, A. Jones, ...
nameA. Smith
pagep2
Dr. A. Smith is ...… …
Change “A. Smith” to “D. Smith”
extractNames extractNames
O’
I
O
p
11
Interpreting User Feedback Based On Interpreting User Feedback Based On Tuple ProvenanceTuple Provenance
Provenance of output tuple t :– the set of input tuples that operator p used to produce t
nameA. SmithA. Jones
pagep1
extractNames
p1p1
Change “A. Smith” to “D. Smith”
If the operator produces {“A. Smith”, “A. Jones”} from {p1},
then replace {“A. Smith”, “A. Jones”} with {“D. Smith”, “A. Jones”}
p1p2
page
extractNames
p1p1p2
nameA. SmithA. JonesA. Smith
12
ChallengesChallenges
How to expose program data for user feedback?
How to incorporate user feedback?
How to efficiently execute a program?– Incremental execution– Improved concurrency control
13
Incrementally Executing the Program Incrementally Executing the Program
?
p2p1
page
…name
extractNames
p2p1
page
extractNames
p3
Similar problem in incremental view maintenance Incremental-update properties
– Closed-formed insertion– Closed-formed deletion– Input partitionability– Partition correlation– Attribute independence
extractNames(I+I)
extractNames(I)=
extractNames(I)+
14
Concurrently Executing Transactions Concurrently Executing Transactions
dataSources
services
extractConf
crawl
extractNames
findRoles
…09/01/2008http://.../cidr09/
dateurl
…
Joe Hellersteinname
PC ChairCIDR 2009roleconf
… … …
name pagerole… … …
rolesT2
T1 Locks only the input and output tables of the crawl operator
Table-Locking
Skips executing the join operator after updating the roles table
Operator-Skipping
15
Experiment SetupExperiment Setup Testbed
– A 5-stage DBLife workflow– 13 blackbox operators: 6 IE operators and 3 II operators
Wrote xlog program and user-feedback rules in < 1 hr
Simulated user-feedback transactions– On each stage of the workflow– Each transaction randomly deletes, inserts, or modifies
1/10 of the tuples in a table
16
Incremental-Update Properties are Incremental-Update Properties are Broadly ApplicableBroadly Applicable
Inc. Update Properties DBLife Operators ci cd ip ai pc
Get Data Pages Get People Variations
Get Publication Variations Get Organization Variations
Find People Variations Find Publication Variations
Find Organization Variations Find People Entities
Find Publication Entities Find Organization Entities
Find Related People Find Authorship
Find Related Organizations
17
Incremental Update Incremental Update Reduces Execution TimeReduces Execution Time
18
Table-Locking and Operator-Skipping Table-Locking and Operator-Skipping Improve Concurrency DegreeImprove Concurrency Degree
Increase transaction throughput by 50% and 500%
Reduce transaction response time by 43% and 98%
Min Max Average Graph-locking ~0s 7,584s 3,203s Table-locking 1s 5,485s 1,841s Operator-skipping ~0s 457s 43s
-43%-98%
19
Related WorkRelated Work User feedback in IE and II
– [Doan et al, 01], [Chiticariu et al, 08], [Jeffery et al, 08]– Leveraging user feedback to improve results of individual operations
Provenance– [Woodruff & Stonebraker, 97], [Cui & Widom, 01], [Buneman et al, 01],
[Bohannon et al, 08] ], [Huang et al, 08]
Incremental execution– View maintenance [Blakeley et al, 86], [Griffin & Libkin, 95], [Gupta &
Mumick, 95] – Schema matching [Bernstein et al, 06], IE [Chen et al, 07]
20
Conclusions and Future WorkConclusions and Future Work
Incorporating user feedback into IE and II programsis important
Identify key issues and provide initial solutions:– Write user-feedback rules to expose program data to UIs– Model and incorporate user feedback– Efficiently execute program to process user feedback
Future work:– Handle unreliable user feedback– Propagate user feedback down in the workflow– Conduct user study