Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)
description
Transcript of Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)
![Page 1: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/1.jpg)
Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)DANSAS 2010 (In proc. of PPDP
2010)
Typed and Unambiguous Pattern Matching on Strings
using Regular Expressions
[http://xkcd.com/208/]
![Page 2: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/2.jpg)
2
Main MessageFor regular expressions:Pattern matching
Precise syntax-directed ambiguity analysis
Typed mapping into a target language
![Page 3: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/3.jpg)
3
Introduction & Motivation
Parsing dynamic input is an ubiquitous problem
URLs:
Log Files:
The solution is pattern matching
http://www.cs.au.dk/index.php?id=141&view=details
13/02/2010 66.249.65.107 get /support.html20/02/2010 42.116.32.64 post /search.html
protocol host path query-string
(list of key-value pairs)
![Page 4: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/4.jpg)
4
Example Example (date):
Matching against string: yields:
"26/06/1992"day = 26 month = 06 year = 1992
[0-9]{1,2} "/" [0-9]{1,2} "/" [0-9]{4}
<day = [0-9]{1,2} > "/" <month = [0-9]{1,2} > "/" <year = [0-9]{4} >
![Page 5: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/5.jpg)
5
Example (date):
String 2082010: day = 2 and month = 08 (ie. 2nd of
August) day = 20 and month = 8 (ie. 20th of
August)
Example
<day = [0-9]{1,2} > "/" <month = [0-9]{1,2} > "/" <year = [0-9]{4} >
<day = [0-9]{1,2} > <month = [0-9]{1,2} > <year = [0-9]{4} >
![Page 6: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/6.jpg)
6
Expressive (enough)DeclarativeDecidable propertiesWell known
Why regular expressions?
![Page 7: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/7.jpg)
7
.
Outline Our setup Regular Expressions:
The Recording Construction
Ambiguity: Disambiguation
Type Mapping Conclusion
![Page 8: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/8.jpg)
8
Our setup<URL = [a-z]*>;...
url.rex
URL.java...
Compile (our tool)
Compile (javac)
URL.classFoo.class...
import URL;class Foo { ...}
Foo.java URL.javaFoo.java...
![Page 9: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/9.jpg)
9
Outline Our setup Regular Expressions:
The Recording Construction
Ambiguity: Disambiguation
Type Mapping Conclusion
![Page 10: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/10.jpg)
10
Regular Expressions Syntax:
Semantics:
where: L1 L2 is concatenation (i.e., { 1 2 | 1L1,
2L2 }) L* = i0 Li where L0 = { } and Li = L
Li-1
Usual extensions : Any character ”.” as c1|c2|...|cn,
ci Character ranges ”[a-z]” as
a|b|...|z Repetitions ”R{2,3}” as RR|
RRR
![Page 11: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/11.jpg)
Recording Syntax:
” ” is a recording identifier (it "remembers" the substring it matches)
Semantics:
Example (simplified emails):
Matching against string:yields:
[a-z]+ "@" [a-z]+ ("." [a-z]+)*
user = "obama" domain = "whitehouse.gov"&
<user = > <domain = >
11
Related: "x as R" in XDuce; "x::R" in CDuce; and "x@R" in Scala and HaRP
![Page 12: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/12.jpg)
12
Outline Our setup Regular Expressions:
The Recording Construction
Ambiguity: Disambiguation
Type Mapping Conclusion
![Page 13: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/13.jpg)
13
Ambiguity Example from before
matched on the string “208” gives rise to: day = 2 and month = 08 (ie. 2nd of August) day = 20 and month = 8 (ie. 20th of August)
Multiple ways of matching => ambiguous
Problem: Concatenation
<day = [0-9]{1,2} > <month = [0-9]{1,2} >
2 0 8
day month
![Page 14: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/14.jpg)
14
Ambiguity analysis Theorem:
R unambiguous iff NB: sound & complete !
Related work: [Brabrand+Giegerich+Møller’09]: Similar approach for context free grammars.[Book+Even+Greibach+Ott'71] and [Hosoya'03] for XDuce but indirectly via NFA, not directly (syntax-directed).
![Page 15: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/15.jpg)
15
Outline Our setup Regular Expressions:
The Recording Construction
Ambiguity: Disambiguation
Type mapping Conclusion
![Page 16: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/16.jpg)
16
2) Restriction: R1 - R2
L(R1 - R2) = L(R1) \ L(R2)
4) Default disambiguation: concat, choice, and star
are all left-biased (by default) !
(Our tool does this)
1) Manual rewriting: Always possible :-) Tedious :-( Error-prone :-( Not structure-preserving :-(
3) Disambiguators: Three basic operators choice:'|L', '|R' concat: 'L', 'R' star: '*L', '*R'
Disambiguation
<foo = a > | <bar = a* >is rewritten to <foo = a > | <bar = |aaa* >
<foo = a > | <bar = a* >using restriction <foo = a > | <bar = a*-a >
<foo = a > | <bar = a* >using restriction we get <foo = a > |L <bar = a* >
<foo = a > | <bar = a* >no need to rewrite
Related work: [Vansummeren'06] but with global, not local disambiguation
![Page 17: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/17.jpg)
17
Outline Our setup Regular Expressions:
The Recording Construction
Ambiguity: Disambiguation
Type Mapping Conclusion
![Page 18: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/18.jpg)
Type Mapping Our date example
Type of the recordings day, month, and year? Strings (=> many type casts) Infer the type
<day = [0-9]{2} > "/" <month = [0-9]{2} > "/”<year = [0-9]{4} >
18
![Page 19: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/19.jpg)
Type Mapping A recording has three type components:
a linguistic type (language of the recording - maps to String, int, float, etc).
a structural type (nested recordings – maps to (nested) classes).
a type modifier (maps to lists).
19
Related work: Exact type inference in XDuce & CDuce(soundness+completeness proof in [Vansummeren'06])but not for stand-alone and non-intrusive usage (Java)
![Page 20: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/20.jpg)
20
Type Mapping ExamplePerson = <name = > " (" <age = > ")"[a-z]+ [0-9]+
class Person { // auto-generated String name; int age; static Person match(String s) { ... } public String toString() { ... }}
compile(our tool)
String s = "obama (48)";
Person p = Person.match(s);print(p.name + " is " + p.age + "y old");
Usage
![Page 21: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/21.jpg)
21
ConclusionRegular expressions are alive and well. This paper: Used for pattern matching Precise ambiguity analysis Type mappingFuture work: improve performance, subtype of
recordings "trade (excess) expressivity for
safety+simplicity” Thank you. Questions?
![Page 22: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/22.jpg)
22
Abstract Syntax Trees (ASTs)
![Page 23: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/23.jpg)
23
Ambiguity Definition:
R ambiguous iffT,T'ASTR: T T' ||T|| = ||T'||
where ||||: AST * (the flattening) is:
TR
T'R'
=
![Page 24: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/24.jpg)
24
Characterization of Ambiguity
Theorem: R unambiguous iff
NB: sound & complete !
R* = | RR*
![Page 25: Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)](https://reader035.fdocuments.us/reader035/viewer/2022070421/568161b9550346895dd188f7/html5/thumbnails/25.jpg)
25
Type Inference Type Inference:
R : (L,S)