1st year Master of Applied Computer Science Faculty of...
Transcript of 1st year Master of Applied Computer Science Faculty of...
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel 1
Section 4
Grammars
“specifying structure”
Fundamentals of Computer Science1st year Master of Applied Computer Science
Faculty of Engineering SciencesVrije Universiteit Brussel
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Pico{ QuickSort(V,Low,High): { Left: Low; Right: High; Pivot: V[(Left + Right) // 2]; Save: 0; until(Left > Right, { while(V[Left] < Pivot, Left:= Left+1); while(V[Right] > Pivot, Right:= Right-1); if(Left <= Right, { Save:= V[Left]; V[Left]:= V[Right]; V[Right]:= Save; Left:= Left+1; Right:= Right-1 }, void ) });display(Low, eoln); if(Low < Right, QuickSort(V, Low, Right), void); if(High > Left, QuickSort(V, Left, High), void) }; V[10000]: random(); QuickSort(V,1,size(V)); display(V[size(V)]) }
2
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
fac(n): if(n>1, n*fac(n-1), 1)
{! <NAM, fac>,<LPR>,<NAM, n>,<RPR>,<COL>,<NAM, if>,! <LPR>,<NAM, n>,<ROP, >>,<NBR, 1>,<COM>,<NAM, n>,! <MOP, *>,<NAM, fac>,<LPR>,<NAM, n>,<AOP, ->, ! <NBR, 1>,<RPR>,<COM>,<NBR, 1>,<RPR>,<END> }
Scanning text:
2
textual representation
tokenized representation
3
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Scanning text (cont'd):{ AOP_token: 1; CAT_token: 2; CEQ_token: 3; COL_token: 4; COM_token: 5; END_token: 6; FRC_token: 7; LBC_token: 8; LBR_token: 9; LPR_token: 10; MOP_token: 11; NAM_token: 12; NBR_token: 13; RBC_token: 14; RBR_token: 15; ROP_token: 16; RPR_token: 17; SMC_token: 18; TXT_token: 19; XOP_token: 20; scan_data: void; scan(): ...
3
token values
token attribute (string, number
or fraction)
4
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
init_scan('fac(n): if(n>1, n*fac(n-1), 1)'):<void>scan():12scan_data:facscan():10scan():12scan_data:nscan():17scan():4scan():12scan_data:if
trans
cript
4
5
Scanning text (cont'd):
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Scanning text (cont'd):aop: 1; apo: 2; bkq: 3; cat: 4; col: 5;com: 6; dgt: 7; eol: 8; eql: 9; exp: 10;ill: 11; lbc: 12; lbr: 13; lpr: 14; ltr: 15;mns: 16; mop: 17; per: 18; pls: 19; quo: 20;rbc: 21; rbr: 22; rop: 23; rpr: 24; smc: 25;wsp: 26; xop: 27;
ch_tab: [`end` wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, eol, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, xop, quo, rop, aop, aop, mop, apo, lpr, rpr, mop, pls, com, mns, per, mop, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, col, smc, rop, eql, rop, xop, cat, ltr, ltr, ltr, ltr, exp, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, lbr, mop, rbr, xop, ltr, ill, ltr, ltr, ltr, ltr, exp, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, lbc, aop, rbc, aop, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill,
ill, ill, ill, ill, ill, ill, ill, ill ];
5
character categories
category of each ascii character (except the first
with value 0)
6
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Scanning text (cont'd):wsp ⇒ white space
eol ⇒ end of line
ltr ⇒ letter + {_} – {e E}
dgt ⇒ digit
exp ⇒ {e E}
aop ⇒ {$ % | ~}
rop ⇒ {# < >}
mop ⇒ {* & / \}
xop ⇒ {! ? ^}
pls ⇒ {+}
mns ⇒ {–}
apo ⇒ {'}
quo ⇒ {"}
bkq ⇒ {`}
com ⇒ {,}
per ⇒ {.}
col ⇒ {:}
eql ⇒ {=}
cat ⇒ {@}
lpr ⇒ {(}
rpr ⇒ {)}
lbr ⇒ {[}
rbr ⇒ {]}
lbc ⇒ {{}
rbc ⇒ {}}
ill ⇒ illegal
6
meaning of the character categories
7
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Scanning text (cont'd): check(allowed): if(ch = 0, false, allowed[ch_tab[ch]]); uncheck(allowed): if(ch = 0, false, not(allowed[ch_tab[ch]])); mask@list: { msk[siz]: false; for(k: 1, k:= k+1, not(k > size(list)), msk[list[k]]:= true); msk }; apo_allowed: mask(apo); apx_allowed: mask(apo,eol); bkq_allowed: mask(bkq,eol); dgt_allowed: mask(dgt); eql_allowed: mask(eql); exp_allowed: mask(exp); nam_allowed: mask(dgt,exp,ltr); opr_allowed: mask(aop,eql,mns,mop,pls,rop,xop); per_allowed: mask(per); quo_allowed: mask(quo); qux_allowed: mask(eol,quo); sgn_allowed: mask(pls,mns); wsp_allowed: mask(wsp,eol);
7
8
masks are vectors with true/false values for each
character category
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Scanning text (cont'd):
8
a function for each
character categorya dispatch
vector for each character category
selecting the correct
function for a given
character
9
aop_fun(): { ... };apo_fun(): { ... };
wsp_fun(): { ... };xop_fun(): { ... };
fun_tab: [ aop_fun, apo_fun, bkq_fun, cat_fun, col_fun, com_fun, dgt_fun, wsp_fun, rop_fun, ltr_fun, ill_fun, lbc_fun, lbr_fun, lpr_fun, ltr_fun, aop_fun, mop_fun, ill_fun, aop_fun, quo_fun, rbc_fun, rbr_fun, rop_fun, rpr_fun, smc_fun, wsp_fun, xop_fun ];
scan(): if(ch = 0, END_token, { fun: fun_tab[ch_tab[ch]]; fun() });
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
illustr
ation
col_fun(): { skip_ch(); if(check(eql_allowed), next_ch(CEQ_token), COL_token) }; com_fun(): next_ch(COM_token); dgt_fun(): { freeze(); until(uncheck(dgt_allowed), skip_ch()); if(check(per_allowed), fraction(), if(check(exp_allowed), exponent(), capture_number(NBR_token))) };
9
start and finish capture of characters representing
number
call auxiliary functions
10
Scanning text (cont'd):
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens:{! <NAM, fac>,<LPR>,<NAM, n>,<RPR>,<COL>,<NAM, if>, <LPR>,<NAM, n>,! <ROP, >>,<NBR, 1>,<COM>,<NAM, n>, <MOP, *>,<NAM, fac>,<LPR>,! <NAM, n>,<AOP, ->, <NBR, 1>,<RPR>,<COM>,<NBR, 1>,<RPR>,<END> }
[10, fac, [5, [[8, n]]], [11, if, [5, [[11, >, [5, [[8, n], [1, 1]]]], [11, *, [5, [[8, n], [11, fac, [5, [[11, -, [5, [[8, n], [1, 1]]]]]]]]]], [1, 1]]]]]
10
tokenized representation
abstract representation
11
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd):
ParserScanner
characterstream
tokenstream
abstractrepresentation
concreterepresentation
11
abstract grammar
concrete grammar
12
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd): <program> ::= <expression> <expression> ::= <invocation> <expression> ::= <invocation> : <expression> <expression> ::= <invocation> := <expression> <invocation> ::= <comparand> <invocation> ::= <invocation> <comparator> <comparand> <comparand> ::= <term> <comparand> ::= <comparand> <adder> <term> <term> ::= <factor> <term> ::= <term> <multiplier> <factor> <factor> ::= <reference> <factor> ::= <factor> <power> <reference> <reference> ::= <number> <reference> ::= <fraction> <reference> ::= <text> <reference> ::= <variable> <reference> ::= <prefix> <reference> ::= <application> <reference> ::= <apply> <reference> ::= <tabulation> <reference> ::= <subexpression> <reference> ::= <sequence> <reference> ::= <table>
12
this is a concrete
grammar for Pico
13
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd):
this is a concrete
grammar for Pico
<prefix> ::= <operator> <reference> <application> ::= <variable> ( ) <application> ::= <variable> ( <commalist> ) <apply> ::= <variable> @ <invocation> <tabulation> ::= <name> [ <expression> ] <subexpression> ::= ( <expression> ) <sequence> ::= { <semicolonlist> } <table> ::= [ ] <table> ::= [ <commalist> ] <commalist> ::= <expression> <commalist> ::= <expression> , <commalist> <semicolonlist> ::= <expression> <semicolonlist> ::= <expression> ; <semicolonlist> <variable> ::= <name> <variable> ::= <operator> <operator> ::= <power> <operator> ::= <multiplier> <operator> ::= <adder> <operator> ::= <comparator>
13
14
this is a concrete
grammar for Pico
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd): <scale> ::= #exponent# + <number> <scale> ::= #exponent# - <number> <scale> ::= #exponent# <number> <number> ::= #digit# <number> ::= #digit# <number> <fraction> ::= <number> . <number> <scale> <fraction> ::= <number> . <number> <fraction> ::= <number> <scale> <comparator> ::= #comparator# <operator> <adder> ::= #adder# <operator> <multiplier> ::= #multiplier# <operator> <power> ::= #power# <operator> <operator> ::= #operator# <operator> ::= #operator# <operator> <name> ::= #letter# <rest> <rest> ::= <rest> ::= #digit# <rest> <rest> ::= #letter# <rest> #letter# = { a ,..., z , A ,..., Z , _ } #digit# = { 0 ,..., 9 } #exponent# = { e , E } #comparator# = { < , = , > } #adder# = { + , - , | } #multiplier# = { * , / , \ , & } #power# = { ^ } #operator# = #comparator# + #adder# + #multiplier# + #power#
14
15
this is a concrete
grammar for Pico
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd): <expression> ::= <number> <expression> ::= <fraction> <expression> ::= <text> <expression> ::= <table> <expression> ::= <function> <expression> ::= <native> <expression> ::= <variable> <expression> ::= <application> <expression> ::= <tabulation> <expression> ::= <definition> <expression> ::= <assignment> <expression> ::= <void> <number> ::= NBR <number> <fraction> ::= FRC <fraction> <text> ::= TXT <text> <table> ::= TAB <table> <function> ::= FUN <identifier> <arguments> <expression> <dictionary> <native> ::= NAT <identifier> <function> <variable> ::= VAR <identifier> <application> ::= APL <identifier> <arguments> <tabulation> ::= TBL <identifier> <expression> <definition> ::= DEF <invocation> <expression> <assignment> ::= SET <invocation> <expression> <dictionary> ::= DCT <identifier> <expression> <dictionary> <void> ::= VOI <identifier> ::= <text> <arguments> ::= <table> <arguments> ::= <invocation> <invocation> ::= <variable> <invocation> ::= <application> <invocation> ::= <tabulation>
15
16
this is an abstract
grammar for Pico
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd): NBR_tag: 1; NBR(Val): [ NBR_tag, Val ]; NBR_VAL_idx: 2;
FRC_tag: 2; FRC(Val): [ FRC_tag, Val ]; FRC_VAL_idx: 2;
TXT_tag: 3; TXT(Val): [ TXT_tag, Val ]; TXT_VAL_idx: 2;
TAB_tag: 4; TAB(Tab): [ TAB_tag, Tab ]; TAB_TAB_idx: 2;
FUN_tag: 5; FUN(Nam, Par, Bod, Dct): [ FUN_tag, Nam, Par, Bod, Dct ]; FUN_NAM_idx: 2; FUN_PAR_idx: 3; FUN_EXP_idx: 4; FUN_DCT_idx: 5;
NAT_tag: 6; NAT(Nam, Nat): [ NAT_tag, Nam, Nat ]; NAT_Nam_idx: 2; NAT_NAT_idx: 3;
VAR_tag: 7; VAR(Nam): [ VAR_tag, Nam ]; VAR_NAM_idx: 2;
16
every abstract expression is
tagged
every abstract expression is composed of indexed parts
17
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd): APL_tag: 8; APL(Nam, Arg): [ APL_tag, Nam, Arg ]; APL_NAM_idx: 2; APL_ARG_idx: 3;
TBL_tag: 9; TBL(Nam, Idx): [ TBL_tag, Nam, Idx ]; TBL_NAM_idx: 2; TBL_IDX_idx: 3;
DEF_tag: 10; DEF(Inv, Exp): [ DEF_tag, Inv, Exp ]; DEF_INV_idx: 2; DEF_EXP_idx: 3;
SET_tag: 11; SET(Inv, Exp): [ SET_tag, Inv, Exp ]; SET_INV_idx: 2; SET_EXP_idx: 3;
DCT_tag: 12; DCT(Nam, Val, Dct): [ DCT_tag, Nam, Val, Dct ]; DCT_NAM_idx: 2; DCT_VAL_idx: 3; DCT_DCT_idx: 4;
VOI_tag: 13; VOI(): [ VOI_tag ];
17
18
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel tra
nscri
pt
read('123'):[1, 123]read('abc'):[7, abc]read('abc(1,2,3)'):[8, abc, [4, [[1, 1], [1, 2], [1, 3]]]]read('f(x): x'):[10, [8, f, [4, [[7, x]]]], [7, x]]read('t[360]: sin(h:= h+Pi/180)'):[10, [9, t, [1, 360]], [8, sin, [4, [[11, [7, h], [8, +, [4, [[7, h], [8, /, [4, [[7, Pi], [1, 180]]]]]]]]]]]]
18
a number
a variable
a call
a definition
a table
19
Parsing tokens (cont'd):
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd):{ tag => fun: [tag, fun];
else: 0;
case@clauses: { default: void; siz: size(clauses); max: 0; for(k: 1, k:= k+1, not(k > siz), { clause: clauses[k]; if(clause[1] = else, default:= clause[2], if(clause[1] > max, max:= clause[1], void)) }); tbl[max]: default; for(k: 1, k:= k+1, not(k > siz), { clause: clauses[k]; if(clause[1] = else, void, tbl[clause[1]]:= clause[2]) }); select(tag): if(tag > max, default, tbl[tag]) }
19
we will need a case statement!
build a table tbl of sufficient size to accept the case tags as indexes
store the clauses, and the eventual default in
tbl
return a function that looks up a tag in tbl and
returns the corresponding clause
20
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
vowel_test(ch): { vowel(ch): display(ch, ' is a vowel', eoln); consonant(ch): display(ch, ' is a consonant', eoln); char_fun: case(ord('a') => vowel, ord('A') => vowel, ord('e') => vowel, ord('E') => vowel, ord('i') => vowel, ord('I') => vowel, ord('o') => vowel, ord('O') => vowel, ord('u') => vowel, ord('U') => vowel, else => consonant); vowel_test(ch):= { fun: char_fun(ord(ch)); fun(ch) }; vowel_test(ch) } :<function vowel_test>vowel_test('Z'):Z is a consonantvowel_test('u'):u is a vowel
trans
cript
20
example
21
Parsing tokens (cont'd):
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
21
token: void; skip(): token:= scan()
next(Dat): { skip(); Dat }
22
Parsing tokens (cont'd): identity(Inv): Inv;
definition(Inv): DEF(next(Inv), expression());
assignment(Inv): SET(next(Inv), expression());
exp_case: case(COL_token => definition, CEQ_token => assignment, else => identity);
expression(): { inv: invocation(); cas: exp_case(token); cas(inv) }; read(Str): { init_scan(Str); token := scan(); expression() }
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
22
23
Parsing tokens (cont'd): operation(Opr, Tkn): { opd: Opr(); while(token = Tkn, { opr: next(scan_data); arg: [ opd, Opr() ]; opd:= APL(opr, TAB(arg)) }); opd }; factor(): operation(reference, XOP_token);
term(): operation(factor, MOP_token);
comparand(): operation(term, AOP_token);
invocation(): operation(comparand, ROP_token)
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd):
23
24
number(): NBR(next(scan_data));
fraction(): FRC(next(scan_data));
text(): TXT(next(scan_data));
ref_case: case(NBR_token => number, FRC_token => fraction, TXT_token => text, NAM_token => name, ROP_token => operator, AOP_token => operator, MOP_token => operator, XOP_token => operator, LPR_token => parentheses, LBC_token => braces, LBR_token => brackets, else => message); reference(): { cas: ref_case(token); cas() }
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
Parsing tokens (cont'd):
24
tab_str: 'tab'
begin_str: 'begin'
25
var_case: case(LPR_token => application, LBR_token => tabulation, CAT_token => apply, else => variable);
name(): { var: next(scan_data); cas: var_case(token); cas(var) };
parentheses(): { skip(); exp: expression(); if(token = RPR_token, skip(), message()); exp };
braces(): { skip(); APL(begin_str, list(SMC_token, RBC_token)) };
brackets(): { skip(); if(token = RBR_token, APL(tab_str, next(Empty)), APL(tab_str, list(COM_token, RBR_token))) }
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
25
26
Parsing tokens (cont'd): prefix(Var): { arg: [ reference() ]; APL(Var, TAB(arg)) };
opr_case: case(NBR_token => prefix, FRC_token => prefix, TXT_token => prefix, NAM_token => prefix, ROP_token => prefix, AOP_token => prefix, MOP_token => prefix, XOP_token => prefix, LPR_token => application, CAT_token => apply, LBR_token => tabulation, else => variable);
operator(): { opr: next(scan_data); cas: opr_case(token); cas(opr) }
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
26
27
Parsing tokens (cont'd):
application(Var): { skip(); if(token = RPR_token, APL(Var, next(Empty)), APL(Var, list(COM_token, RPR_token))) };
apply(Var): { skip(); ref: reference(); APL(Var, ref) };
tabulation(Var): { skip(); idx: expression(); if(token = RBR_token, skip(), message()); TBL(Var, idx) };
variable(Var): VAR(Var)
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
27
28
Parsing tokens (cont'd):
list(Sep, Trm): { loop(count): { exp: expression(); if(token = Sep, { skip(); tab: loop(count+1); tab[count]:= exp }, if(token = Trm, { skip(); tab[count]: void; tab[count]:= exp }, message())) }; TAB(loop(1)) }
Master of Applied Computer Science
Theo D’HondtFundamentals of Computer Science
Vrije Universiteit Brussel
msg_tab: [ 'additive operator', 'application', 'assignment', 'definition', 'comma', 'end of text', 'fraction', 'left brace', 'left bracket', 'left parenthesis', 'multiplicative operator', 'name', 'number', 'right brace', 'right bracket', 'relational operator', 'right parenthesis', 'semicolon', 'text', 'exponentiation operator' ]; message@any: error('Unexpected ', msg_tab[token])
28
29
Parsing tokens (cont'd):