sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation...
Transcript of sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation...
![Page 1: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/1.jpg)
Proof Positive and Negative in Data Cleaning
Matteo InterlandiNan Tang
Sherlock Rules
![Page 2: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/2.jpg)
•Motivation
•Sherlock Rules
•Fundamental problems
•Algorithms
Outline
2
![Page 3: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/3.jpg)
Roadblocks to Get Value from Data?
3
Data Mining
Machine Learning
Rule Discovery
![Page 4: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/4.jpg)
Roadblocks to Get Value from Data?
3
Data Mining
Machine Learning
Rule Discovery
![Page 5: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/5.jpg)
Roadblocks to Get Value from Data?
3
High Quality Data
Data Mining
Machine Learning
Rule Discovery
![Page 6: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/6.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
![Page 7: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/7.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
consistent D’nation -> capital
name nation capitalSi China Beijing
Yan China BeijingIan China Beijing
data repairing
![Page 8: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/8.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
consistent D’nation -> capital
name nation capitalSi China Beijing
Yan China BeijingIan China Beijing
data repairing
![Page 9: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/9.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
consistent D’nation -> capital
name nation capitalSi China Beijing
Yan China BeijingIan China Beijing
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
annotated D”
data repairingproof positive and negative
![Page 10: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/10.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
consistent D’nation -> capital
name nation capitalSi China Beijing
Yan China BeijingIan China Beijing
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
annotated D”
data repairingproof positive and negative
help
![Page 11: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/11.jpg)
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
D
consistent D’nation -> capital
name nation capitalSi China Beijing
Yan China BeijingIan China Beijing
name nation capitalSi China Beijing
Yan China ShanghaiIan China Tokyo
annotated D”
data repairingproof positive and negative
helpSherlock Rules
![Page 12: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/12.jpg)
•Motivation
•Sherlock Rules
•Fundamental problems
•Algorithms
Outline
5
![Page 13: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/13.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
![Page 14: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/14.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
![Page 15: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/15.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
Proof Positive/Negative, Correction
t3[Ian] is correct, t3[officePhn] = 27364928
![Page 16: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/16.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
Proof Positive/Negative, Correction
t3[Ian] is correct, t3[officePhn] = 27364928
![Page 17: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/17.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
Proof Positive/Negative, Correction
t3[Ian] is correct, t3[officePhn] = 27364928
Proof Positive/Negative
t3[Ian] is correct, t3[officePhn] is wrong
![Page 18: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/18.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
country capitalChina BeijingJapan TokyoChile Santiago
s1s2s3
Proof Positive/Negative, Correction
t3[Ian] is correct, t3[officePhn] = 27364928
Proof Positive/Negative
t3[Ian] is correct, t3[officePhn] is wrong
![Page 19: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/19.jpg)
Proof Positive and Negative
6
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
country capitalChina BeijingJapan TokyoChile Santiago
s1s2s3
Proof Positive/Negative, Correction
t3[Ian] is correct, t3[officePhn] = 27364928
Proof Positive
t1[nation, capital] is correct t3[nation, capital] is correct
Proof Positive/Negative
t3[Ian] is correct, t3[officePhn] is wrong
![Page 20: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/20.jpg)
Sherlock Rules
7
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
country capitalChina BeijingJapan TokyoChile Santiago
s1s2s3
D
Dm
evidence positive
negative
![Page 21: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/21.jpg)
Sherlock Rules
7
name dept
nation capital bornat officePhnSi DA China Beijing ChenYang 28098001
Yan DA China Shanghai Chengdu 24038698Ian ALT China Beijing Hangzhou 33668323
t1t2t3
name officePhn mobileSi 28098001 66700541
Yan 24038698 66706563Ian 27364928 33668323
r1r2r3
country capitalChina BeijingJapan TokyoChile Santiago
s1s2s3
D
Dm
evidence positive
negative
![Page 22: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/22.jpg)
Point of Innovation
8
Integrity Constraints
There does not exist t1[X1] = t2[X2] but
t1[B1] = t2[B2]
(China, Shanghai)
(China, Beijing)
=
<>
![Page 23: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/23.jpg)
Point of Innovation
8
Integrity Constraints
There does not exist t1[X1] = t2[X2] but
t1[B1] = t2[B2]
(China, Shanghai)
(China, Beijing)
=
<>
![Page 24: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/24.jpg)
Point of Innovation
8
Integrity Constraints
There does not exist t1[X1] = t2[X2] but
t1[B1] = t2[B2]
(China, Shanghai)
(China, Beijing)
Sherlock Rules
t1[X1] = t2[X2] and t1[B] = t2[B-], then
t1[B] := t2[B+]
(China, Shanghai)
(China, Beijing, Shanghai)
=
<>
![Page 25: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/25.jpg)
Point of Innovation
8
Integrity Constraints
There does not exist t1[X1] = t2[X2] but
t1[B1] = t2[B2]
(China, Shanghai)
(China, Beijing)
Sherlock Rules
t1[X1] = t2[X2] and t1[B] = t2[B-], then
t1[B] := t2[B+]
(China, Shanghai)
(China, Beijing, Shanghai)
=
<>
![Page 26: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/26.jpg)
Point of Innovation
8
Integrity Constraints
There does not exist t1[X1] = t2[X2] but
t1[B1] = t2[B2]
(China, Shanghai)
(China, Beijing)
Sherlock Rules
t1[X1] = t2[X2] and t1[B] = t2[B-], then
t1[B] := t2[B+]
(China, Shanghai)
(China, Beijing, Shanghai)
=
<>
![Page 27: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/27.jpg)
Applying Multiple Rules
9
+
Pos(t)
Neg(t)
Free(t)-
+
![Page 28: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/28.jpg)
Sherlock Rules in Action
10
t1 (Si, DA, China, Beijing, ChenYang, 28098001)
t1 (Si+, DA, China, Beijing, ChenYang-, 28098001+)
t1 (Si+, DA, China, Beijing, ShenYang+, 28098001+)
![Page 29: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/29.jpg)
Sherlock Rules in Action
10
t1 (Si, DA, China, Beijing, ChenYang, 28098001)
t1 (Si+, DA, China, Beijing, ChenYang-, 28098001+)
t1 (Si+, DA, China, Beijing, ShenYang+, 28098001+)
Pos(t1)
![Page 30: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/30.jpg)
Transformation Rules
11
![Page 31: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/31.jpg)
•Motivation
•Sherlock Rules
•Fundamental problems
•Algorithms
Outline
12
![Page 32: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/32.jpg)
Fundamental Problems
13
Termination
Determinism
Consistency
Implication
(coNP-complete)
(coNP-complete)
![Page 33: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/33.jpg)
•Motivation
•Sherlock Rules
•Fundamental problems
•Algorithms
Algorithms
14
![Page 34: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/34.jpg)
Algorithms
15
Naive Repairing
chase-based
O(|R|x|Sigma|x|M|)
![Page 35: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/35.jpg)
Algorithms
15
Naive Repairing
chase-based
O(|R|x|Sigma|x|M|)
Fast Repairing
Similarity indicesto reduce |M|
(BK-tree, FastSS, n-gram)
Inverted indexto reduce |Sigma|
(hash map)
O(|R|x|Sigma| x com(S))
![Page 36: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/36.jpg)
Algorithms
15
Naive Repairing
chase-based
O(|R|x|Sigma|x|M|)
Fast Repairing
Similarity indicesto reduce |M|
(BK-tree, FastSS, n-gram)
Inverted indexto reduce |Sigma|
(hash map)
O(|R|x|Sigma| x com(S)) Caching similarity index accessesRule pruning based on dependency
![Page 37: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/37.jpg)
Rule Pruning Example
16
R1 R2
R3
R1:R2:R3:t3(Ian, ALT, Chine, Beijing, Hangzhou, 33668323)
![Page 38: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/38.jpg)
Rule Pruning Example
16
R1 R2
R3
R1:R2:R3:t3(Ian, ALT, Chine, Beijing, Hangzhou, 33668323)
iteration 1: {(R1, Yes), (R2, Yes), (R3, No)}
![Page 39: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/39.jpg)
Rule Pruning Example
16
R1 R2
R3
R1:R2:R3:t3(Ian, ALT, Chine, Beijing, Hangzhou, 33668323)
iteration 2: {(R1, Yes), (R2, No), (R3, No)}
iteration 1: {(R1, Yes), (R2, Yes), (R3, No)}
![Page 40: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/40.jpg)
Rule Pruning Example
16
R1 R2
R3
R1:R2:R3:t3(Ian, ALT, Chine, Beijing, Hangzhou, 33668323)
iteration 2: {(R1, Yes), (R2, No), (R3, No)}
iteration 3: {(R1, Yes), (R2, No), (R3, No)}
iteration 1: {(R1, Yes), (R2, Yes), (R3, No)}
![Page 41: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/41.jpg)
17
Conclusion
• Sherlock rules for accurately annotating and repairing data
• Fundamental problems
• Efficient algorithms
![Page 42: sherlock - DA QCRIda.qcri.org/ntang/pubs/sherlock.slides.pdf · Sherlock Rules 7 name dep t nation capital bornat officePhn Si DA China Beijing ChenYang 28098001 Yan DA China Shanghai](https://reader033.fdocuments.us/reader033/viewer/2022051907/5ff9f4c138e012090859a145/html5/thumbnails/42.jpg)
17
Conclusion
• Sherlock rules for accurately annotating and repairing data
• Fundamental problems
• Efficient algorithms
Future Work
• Let SQL drive the Sherlock workhorse
• Extend Sherlock rules to more data such as RDF (knowledge bases)