Vocabulary Similarity transformations Congruence transformations.
Learning Semantic String Transformations from Examples
description
Transcript of Learning Semantic String Transformations from Examples
![Page 1: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/1.jpg)
Learning Semantic String Transformations from
ExamplesRishabh Singh and Sumit
Gulwani
![Page 2: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/2.jpg)
FlashFill
![Page 3: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/3.jpg)
![Page 4: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/4.jpg)
Transformations• Syntactic Transformations – Concatenation of regular expression based
substring– “VLDB2012” “VLDB”
• Semantic Transformations–More than just characters– “1/5/2010” “May 1st 2010”
![Page 5: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/5.jpg)
Semantic Transformations• Semantic information as relational
tables– 1 January, 2 February
• Learn table lookup queries– VLOOKUP macro 2nd most problematic
![Page 6: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/6.jpg)
Outline• Lookup Transformations
• Lookup + Syntactic Transformations
• Case Studies
![Page 7: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/7.jpg)
Table Lookup Transformati
ons
Demo
![Page 8: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/8.jpg)
Learning Framework
Input Strings F Output
StringF1
1. Domain-specific Language L
Fn…
2. Algorithm to learn all Fs from (i,o)
![Page 9: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/9.jpg)
Lookup Transformation Language
![Page 10: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/10.jpg)
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve Russell018-45-8949 4257 Ian Jordan023-34-3254 6418 Mary Dina
Input v1 Output044-58-3429 Steve Russell
Select(Name, EmpRecord, (SSN = v1))
Example - Lookup
![Page 11: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/11.jpg)
ItemRecItemId ItemST-340 StrollerBI-567 BibDI-328 DiapersWI-989 WipesAS-469 Aspirator
PriceRecItemId PriceST-340 $145.6
7BI-567 $3.56DI-328 $21.45WI-989 $5.12AS-469 $2.56
Input v1 OutputStroller $145.67
Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))
Example – Transitive Lookup
![Page 12: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/12.jpg)
Learn Query
ItemRecItemId ItemST-340 StrollerBI-567 BibDI-328 DiapersWI-989 WipesAS-469 Aspirator
PriceRecItemId PriceST-340 $145.6
7BI-567 $3.56DI-328 $21.45WI-989 $5.12AS-469 $2.56
Input v1 OutputStroller $145.67
Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))
![Page 13: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/13.jpg)
Synthesis Algorithm : • Input: (input state , output string )
• Output: all conforming expressions
• Reachability algorithm from input strings
![Page 14: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/14.jpg)
GenerateSt r𝑡
Strings reachable from input row044-58-3429
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve Russell018-45-8949 4257 Ian Jordan
𝜂1 𝜂2 𝜂3Progs [𝜂 1 ]= {𝑣1 }
![Page 15: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/15.jpg)
GenerateSt r𝑡
strings in table rows of visited nodes 044-58-3429 1125 Steve Russell
)B≡ {∧𝐶𝑖={𝑣𝑎𝑙−1 (𝑇 [𝐶𝑖 ,𝑟 ] ) }} 𝑗
![Page 16: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/16.jpg)
GenerateSt r𝑡
……..Repeat until k steps or
fixpoint
![Page 17: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/17.jpg)
GenerateSt r𝑡
……..Steve Russell
𝜂 Progs [𝜂 ]
![Page 18: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/18.jpg)
GenerateSt r𝑡• Sound and k-complete
– t: number of reachable strings– p: number of candidate keys–m: maximum size of a candidate key
![Page 19: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/19.jpg)
Data structure • Maintains tree structure– share common sub-expressions
• CNF of Boolean Conditionals– independent column predicates
![Page 20: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/20.jpg)
Intersect t :D t1∧Dt 2
∧ ≡
![Page 21: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/21.jpg)
Synthesize ProcedureSynthesize((i1,o1), …, (in,on))
P = GenerateStrt(i1,o1)for j = 2 to n:
P’ = GenerateStrt(ij,oj) P = Intersectt(P’, P)
return P
![Page 22: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/22.jpg)
Semantic String
Transformations
Demo
![Page 23: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/23.jpg)
Syntactic String Language [GulwaniPOPL11]
![Page 24: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/24.jpg)
Combined Language
Syntactic manipulations over lookup outputs
Syntactic manipulations before indexing
![Page 25: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/25.jpg)
Synthesis Algorithm:
– Reachability based on syntactic string matches•
– Boolean conditionals
![Page 26: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/26.jpg)
GenerateSt r𝑢SSN: 044-58-3429
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve Russell018-45-8949 4257 Ian Jordan
Mr. Steve Russell
![Page 27: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/27.jpg)
GenerateSt r𝑢SSN: 044-58-3429
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve Russell018-45-8949 4257 Ian Jordan
GenerateSt r ′𝑡
![Page 28: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/28.jpg)
GenerateSt r𝑢SSN: 044-58-3429
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve Russell018-45-8949 4257 Ian Jordan
GenerateSt r ′𝑡
![Page 29: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/29.jpg)
GenerateSt r𝑢{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” } Set of reachable
strings
![Page 30: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/30.jpg)
GenerateSt r𝑢
GenerateSt r𝑠
{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” }
Mr. Steve Russell
and in paper
![Page 31: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/31.jpg)
Experiments• 50 benchmark problems– 12 , 38
• ~1020 consistent expressions– Size of data structure: ~2000
• Performance: 96% less than 1 second
• Ranking: at most 3 examples (95% 2 examples)
![Page 32: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/32.jpg)
Related Work• Matching strings for table joins
– Record Matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06]– Schema Matching [Dhamankar et. al. SIGMOD04, Warren & Tompa
VLDB06]
• Query Synthesis– from representative view [Das Sharma et.al. ICDT10, Tran et.al.
SIGMOD09]
• Text-editing by example– QuickCode[Gulwani POPL11]– SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller et.al.
USENIX01]
![Page 33: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/33.jpg)
Thanks!
End-Users
Algorithm DesignersSoftware
Developers
Large potential
![Page 34: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/34.jpg)
Backup slides
![Page 35: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/35.jpg)
Semantic String Transformations
Time (12 Hr) Time (24 Hr)0930 9:30 AM1520 3:20 PM164808301015201010121425
=TEXT(C,”00 00”)+0
![Page 36: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/36.jpg)
Semantic String Transformations
Date Formatted Date06-03-2008 Jun 3rd, 200803-26-201008-01-200909-24-200705-14-201007-20-199810-24-200408-24-1972
![Page 37: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/37.jpg)
Idea 1: Share sub-expressionsT3
C1 C2 C3
s3 s4 s5
T1
C1 C2 C3
s1 s2 s3
T2
C1 C2 C3
s2 s3 s4
Select(C3, T2, C1=e)
Select(C2, T3, C1=Select(C2,T2,C1=e)
e Select(C2, T1, C1=v1)𝑠2
![Page 38: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/38.jpg)
Youtube VideosFrenchPolishUrduGermanSerbianRussian
http://bit.ly/flashfill
![Page 39: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/39.jpg)
Idea 2: CNF conditionalsT
C1 C2 C3 … Cn Cn+1
s s s s t
v1 v2 … vm Out
s s s t
![Page 40: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/40.jpg)
No. of Consistent Expressions
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 491
10000
100000000
1000000000000
1E+016
1E+020
1E+024
1E+028
1E+032
1E+036
Large number of consistent expressions
Benchmarks
Num
ber
of e
xpre
ssio
ns
![Page 41: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/41.jpg)
Succinct Representation
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
500
1,000
1,500
2,000
Succinct Representation
Benchmarks
Size
of
Dat
a St
ruct
ure
![Page 42: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/42.jpg)
Performance
1 6 11 16 21 26 31 36 41 460.002.004.006.008.00
10.0012.00
Running Time
Benchmarks
Runn
ing
Tim
e (in
sec
onds
)
![Page 43: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/43.jpg)
Ranking
1 2 30
5
10
15
20
25
30
35
40
Ranking Measure
Number of I/O examples
Num
ber
of B
ench
mar
ks
![Page 44: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/44.jpg)
Idea 2: CNF conditionals
{{𝜂1 ,𝜂 2 } ,𝜂2 ,Progs }Progs [𝜂 1 ]≡ {𝑣1 ,𝑣2 ,⋯ ,𝑣𝑚}
Progs [𝜂 2 ]={Select (C𝑛+1 ,𝑇 ,∧𝑖C i= {𝑠 ,𝜂1 })}
𝑚+1Θ ((𝑚+1 )𝑛 )
![Page 45: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/45.jpg)
GenerateSt r𝑡 : string value𝜂
: set of lookup programs to generate
𝑣𝑎 𝑙−1 (𝑠 ):Node𝜂 ,𝑣𝑎𝑙 (𝜂 )=𝑠
![Page 46: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/46.jpg)
Related Work• Record Matching
– Similarity functions for matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06]
– Customizable similarity function [Arasu et. al. VLDB09]
• Learning Schema Matches– iMAP [Dhamankar et. al. SIGMOD04] concat. of
column strings using domain-specific knowledge
– [Warren & Tompa VLDB06] concatenation of column substrings, single table
![Page 47: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/47.jpg)
Related Work• Query Synthesis [Das Sharma et.al. ICDT10, Tran et.al.
SIGMOD09]– Infer relation from large representative example
view– no joins or projections
• Text-editing using examples– QuickCode[Gulwani POPL11] string transformations– SMARTedit[Lau et.al. ML03], Simulatenous
Editing[Miller et.al. USENIX01] programming by demonstration
![Page 48: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/48.jpg)
General Framework• A Domain-specific Transformation Language L
– Expressive and succinct
• Efficient Data structures for set of expressions– Version-space algebra
• GenerateStr – All sets of expressions from I-O example
• Intersect– Intersect two sets of expressions
![Page 49: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/49.jpg)
Emp RecordSSN EmpId Name
027-36-4557 1254 John Henry034-83-7683 2412 William
Johnson044-58-3429 1125 Steve
Russell018-45-8949 4257 Ian Jordan023-34-3254 6418 Mary DinaInput v1 Output
044-58-3429 Steve Russell023-34-3254
Select(Name, EmpRecord, (SSN = v1))
Example - Lookup
![Page 50: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/50.jpg)
ItemRecItemId ItemST-340 StrollerBI-567 BibDI-328 DiapersWI-989 WipesAS-469 Aspirator
PriceRecItemId PriceST-340 $145.6
7BI-567 $3.56DI-328 $21.45WI-989 $5.12AS-469 $2.56
Input v1 OutputStroller $145.67
BibAspirator
Wipes
Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))
Example – Transitive Lookups
![Page 51: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/51.jpg)
Data Structure
![Page 52: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/52.jpg)
Data structure for expressions
![Page 53: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/53.jpg)
Data structure
![Page 54: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/54.jpg)
Data structure
![Page 55: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/55.jpg)
Data structure
![Page 56: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/56.jpg)
T1
C1 C2 C3
s1 s2 s3
T2
C1 C2 C3
s2 s3 s4
Ti
C1 C2 C3
si si+1 si+2
Example
…TmInput v1 Output
s1 sm
![Page 57: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/57.jpg)
Ti-1
C1 C2 C3
si-1 si si+1
Ti-2
C1 C2 C3
si-2 si-1 si
Sub-expression Sharing
𝑠𝑖
![Page 58: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/58.jpg)
Sub-expression Sharing
𝑠𝑖− 1 𝑠𝑖𝑠𝑖− 2
𝜂𝑖
𝜂𝑖− 1
𝜂𝑖− 2
![Page 59: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/59.jpg)
Sub-expression Sharing
{{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }
Progs [𝜂 1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }
![Page 60: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/60.jpg)
Sub-expression Sharing𝑁 (𝑖 )=𝑁 (𝑖−1 )+𝑁 (𝑖−2)
𝑁 (𝑖 )=Θ (2𝑖){{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }
Progs [𝜂 1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }
![Page 61: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/61.jpg)
Intersect t :D t1∧Dt 2
![Page 62: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/62.jpg)
Current State of the Art: Help forums
![Page 63: Learning Semantic String Transformations from Examples](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816761550346895ddc373d/html5/thumbnails/63.jpg)
Observations• Semantic string transformations
• Input-output examples based interaction– New disambiguating inputs
• Add-in with the same interface