Bernd Fischer [email protected] RW713: Compiler and Software Language Engineering.
-
Upload
gary-anthony -
Category
Documents
-
view
216 -
download
2
Transcript of Bernd Fischer [email protected] RW713: Compiler and Software Language Engineering.
Bernd [email protected]
RW713: Compiler andSoftware Language Engineering
Parsing and Unparsing... in a Broad Sense
Vadim Zaytsev, Anya Helene Bagge, Parsing in a Broad Sense,MoDELS’14, LNCS 8767, pp.50-67, 2014, Springer.
Program Representations
SLE tools use different program representations at different abstraction levels:
• textual: strings, tokens, ...• structural: parse trees, ASTs, ...• graphical: vector drawings, graphs, UML models,...
Different representations are typically connected by pairs of bidirectional transformations:
text AST
parsing
unparsing
Textual Representations
• unstructured string of individual characters
• flat sequence of strings (lexemes)– incl. spaces, line breaks, comments etc. (layout)
• flat sequence of typed tokens– with attributes and lexemes but without layout
• structured sequence of typed token groups
f arg = arg +1;
f ;1+=arg' ' ' ' arg' ' ' '
id(f) ;num(1)+=id(arg) id(arg)
id(f) ;num(1)+=id(arg) id(arg)
Structural Representations
• set of alternative parse trees (parse forest)“ambiguity node”
Structural Representations
• parse tree (incl. layout)
Structural Representations
• concrete syntax tree (without layout)
Structural Representations
• abstract syntax tree
Graphical Representations
• rasterized picture• vector graph (drawing)• generic graph• abstract graph model
All representations can bemerged into one “Mega-Model”.
Tokenization
Definition: A tokenizer tokL : Str → Tok for a lexical grammar L maps a character sequence c1,..., cn toa token sequence w1,...,wk so that their concate-nations are equal (i.e., c1+...+cn = w1+...+wk). Its reverse operation is concat.tokL and concat satisfy the following equations:
∀x Str: ∈ concat (tokL(x)) = x
∀y Tok: ∈ tokL(concat (y)) = ylanguage-independent
Adding/Removing Layout
Definition: A strip operation strip: Tok → TTk removes layout information, while a format operation formatL : TTk → Tok introduces it.strip and formatL satisfy the following equation:
∀x TTk: ∈ strip(formatL(x)) = x
What about
∀y Tok: ∈ formatL(strip(y)) = y
language-independent
not injective
also for trees and graphs
Parsing/Unparsing
keeps layout
ignores layout
Imploding/Exploding ASTs
Imploding/Exploding ASTs
Pretty-Printing
What is Pretty-Printing?
What is Pretty-Printing?
A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)
... so that the output is “pretty”
What is Pretty?
w = 128:
var x:integer; y:char; begin x := 1; y := ’a’; end
w = 40: var x:integer; y:char; begin x := 1; y := ’a’; end
w = 30: var x:integer; y:char; begin x := 1; y := ’a’; end
var x:integer; y:char; begin x := 1;y := ’a’; end var
x:integer;y:char; beginx := 1;y := ’a’;end
What is Pretty-Printing?
A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)
... so that• line breaks and indentation represent logical
structure• line breaks and indentation are used consistently• line breaks are minimized
Pretty-Printing Architecture
Bad Ideas:• print text during AST traversal• post-processing on raw text
Instead:• generate (new) AST containing text and mark-up
– for layout hints• interpret mark-up AST and generate raw text
(source)AST
unparsing (mark-up)AST
raw textlayout
language-independentsyntax-directed
Oppen-style Mark-up
Oppen’s (core) algorithm uses two mark-up elements:• blanks: positions where a line can be broken
– can denote number of indentation spaces• groups: sequences of elements that are printed on
one line, if possible; otherwise each element is printed on its own line– represented as pair of opening and closing brackets– any two elements must be separated by a blank– blanks can be “inconsistent”, i.e., printer tries to fit as
many elements as possible on one line before breaking
Oppen-style Mark-up
Examples:[[var blank(2) [x:integer; blank(0) y:char;]]
blank(0) [begin blank(2) [x := 1; blank(0) y := ’a’;] blank(0) end]]
vs.[[var x:integer; blank(4) y:char;]]
blank(0) [begin blank(0) [x := 1; blank(2) y := ’a’;] blank(0) end]]
Box-style Mark-up
Uses boxes (similar to Oppen’s groups)• basic boxes
– plain strings keywords
– subtrees
• horizontal boxes
• vertical boxes
• more: HV, HOV, I, ALT
_1
“foo” KW [ “foo” ]
H [ ]B B Bhs=x
B B B
V [ ]
vs=y is=i B B B
B
B
B
Pretty-print tables can be generated.
Exp.IfThen -- KW["if"] _1 KW["then"] _2, Exp.Let -- KW["let"] _1 KW["in"] _2 KW["end"],
Exp.Let.1:iter-star -- _1,Exp.Let.2:iter-star-sep -- _1 KW[";"]
"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}
Pretty-print tables can be modified.
"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}
Exp.Let -- V vs=1 is=0 [ V vs=1 is=2 [KW["let"] _1] V vs=1 is=2 [KW["in"] _2] KW["end"]
]
Further Reading
• Derek C. Oppen: Prettyprinting. ACM Trans. Program. Lang. Syst. 2(4):465-483 (1980).
• Philip Wadler: A prettier printer. In: The Fun of Programming. A symposium in honour of Professor Richard Bird's 60th birthday Examination Schools, Oxford, 24-25 March 2003.http://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf
Further Reading (II)
• Mark van den Brand, Eelco Visser: Generation of Formatters for Context-Free Languages. ACM Trans. Softw. Eng. Methodol. 5(1):1-41 (1996).
• Tobi Vollebregt, Lennart C. L. Kats, Eelco Visser:Declarative specification of template-based textual editors. LDTA 2012: 8