CS 332 Programming Language Conceptsmercury.pr.erau.edu/~siewerts/cs332/documents/Lectures/... ·...
Transcript of CS 332 Programming Language Conceptsmercury.pr.erau.edu/~siewerts/cs332/documents/Lectures/... ·...
January 20, 2020 © Sam Siewert
CS 332
Programming Language Concepts
Lecture 3 – Programming Language
Syntax
Syntax and Semantics
Syntax [PLP, Scott - Companion Materials]– Informal - set of rules that defines the combinations of symbols
(lexicon) that are considered to be a correctly structured document or fragment in that language [Wikipedia – Syntax (programming languages)]
– Formal – Figure 1.3/1.4, pp 26-32: Parsing with a context-free grammar which is a set of potentially recursive rules that are used to form a parse tree (constructs including statements, expressions, subroutines, …)
LL – Left-to-Right, Left-Most Derivation: Top-DownPredictive [Intuitive]
LR – Left-to-Right, Right-Most Derivation: Bottom-UpMatch and Reduce From Tail to Head
Comparison on Page 71 [69] in PLP – A, B, C;
© Sam Siewert 2
Phases of CompilationFront End, Back End
Interpreter - Common Front End, AST Tree-walk
Execution (P. 27)
© Sam Siewert 3
Regular Expressions
A regular expression is one of the following:
– A character (from lexicon)
– The empty string, denoted by e (epsilon)
– Two regular expressions concatenated
– Two regular expressions separated by | (i.e., or)
– A regular expression followed by the Kleene star (concatenation
of zero or more strings)
Use for example to Define Simple Mathematical
Expressions Allowed in a Language
Or Strings and String Operators as an Alternative
Example
© Sam Siewert 4
Regular Expressions
Numerical Literals
E.g.
– 0123456789
– 0123456789.0123456789
– 0123456789.0123456789E+0123456789
Note that semantics, such as how many digits of
precision are implemented is not defined!
© Sam Siewert 5
What Happens in C?
% gcc numbers.c
numbers.c:5:19: error: invalid digit "9" in octal constant
numbers.c:6:13: error: invalid digit "9" in octal constant
numbers.c: In function 'main':
numbers.c:6: warning: floating constant exceeds range of 'double'
numbers.c:7: warning: floating constant exceeds range of 'double'
A violation of C code semantics. Delete leading 0’s:% gcc numbers.c -o numbers
numbers.c: In function 'main':
numbers.c:6: warning: floating constant exceeds range of 'double'
numbers.c:7: warning: floating constant exceeds range of 'double'
% ./numbers
x1=123456789, y1=123456789, z1=342391
x2=123456789.000000, y2=12345678.012346, z2=inf
x3=123456789.000000, y3=123456789.012346, z3=inf
© Sam Siewert 6
What Happens in C Continued
% gcc numbers2.c -o numbers2
numbers2.c: In function 'main':
numbers2.c:5: warning: large integer implicitly truncated to unsigned type
numbers2.c:5: warning: large integer implicitly truncated to unsigned type
numbers2.c:6: warning: floating constant exceeds range of 'double'
numbers2.c:7: warning: floating constant exceeds range of 'double'
% ./numbers2
x1=123456789, y1=3755744318, z1=1912277059
x2=1234567.125000, y2=1234.567017, z2=inf
x3=123456789.123457, y3=123456789.123456, z3=inf
C is not a STRONGLY TYPED language, so the compiler generates code that might not be what we intended
Most often, float is 7 digits precision, double is 15 digits of precision
Most often, unsigned int is 0 … (2^32-1) = 0 … 4,294,967,295
© Sam Siewert 7
Grammars and AmbiguityAmbiguity – More than one Evaluation (Semantic Result)
P. 50-51 [48-49] (Order of operators – disambiguate)
© Sam Siewert 8
Line Equation ??
expr ➔ expr op expr
➔ expr op id
➔ expr + id
➔ expr op expr + id
➔ expr op id + id
➔ expr * id + id
expr ➔ id(slope) * id(x) + id(intercept)
Ambiguous CFG: expr ➔ id | number | - expr | ( expr ) | expr op expr
op ➔ + | - | * | /
expr ➔ expr op expr
➔ id op expr
➔ id * expr
➔ id * expr op expr
➔ id * id op expr
➔ id * id + expr
expr ➔ id(slope) * id(x) + id(intercept)
Recall Context Free Grammar
CFG Productions
– Expression grammar with precedence and associativity
© Sam Siewert 9
Recall Example 1 (Fig. 2.3, p. 53 [50])
Parse tree for expression grammar (with precedence) for 3 + 4 * 5
Grammar with operator precedence Disambiguates
Parenthesis could disambiguate, e.g. 3 + (4 * 5)
© Sam Siewert 10
Common Rules of Mathematics
Associative – valid rules of replacement for expressions
– With same operator, order in which operations are performed
does not matter as long as sequence of operands unchanged
– Addition and Multiplication are Associative
– NOT Subtraction, Division and Exponentiation
Commutative – if change in order of operands does not
change result
– Addition and Multiplication are Commutative 3 + 4 = 4 + 3
– NOT Subtraction, Division and Exponentiation 3 – 4 ≠ 4 - 3
Distributive – valid rules of replacement
– Multiplication Distributes over Addition
– 2 * (1 + 3) = (2 *1) + (2 * 3)
© Sam Siewert 11
Example 2.4 (p. 53 [50])
Subtraction is not Mathematically Associative
Parse tree for expression grammar (with left associativity) for 10 - 4 – 3 = 6 – 3 = 3
© Sam Siewert 12
Example 2.4 (p. 53 [50])
Parse tree for expression grammar (with right associativity) for 10 - 4 – 3
10 – (4 – 3) = 9
© Sam Siewert 13
expr
term add_op expr
“-”factor
number(10)
term add_op expr
term
factor
number(3)
“-”factor
number(4)
Same grammar? -- NO
Scanning
Recall scanner is responsible for– tokenizing source
– removing comments
– (often) dealing with pragmas (i.e., significant comments)
– saving text of identifiers, numbers, strings
– saving source locations (file, line, column) for error messages
© Sam Siewert 14
FSA (Finite State Automata)
Deterministic Finite Automata
(Non-Deterministic where same input can
cause multiple state transitions can be
rewritten as a DFA)
Markov model uses probability to choose
Recall Parsing
Context Free Grammar
– Symbols, tokens, non-terminals
– Productions (rules that chain)
– Builds Parse Trees
Parsing Recognizes Valid Language
A CFG is a generator for the CFL (Context Free
Language – E.g. All C Programs)
Any CFG has a Parser that is O(n3) (in terms of tokens)
O(n3) is too slow for lengthy programs
Linear LL (Left-to-right, leftmost derivation)
Linear LR (Left-to-right, rightmost derivation)
LALR (Look-Ahead LR)
© Sam Siewert 15
From Book Example for LL (P. 74)1. program → stmt list $$
2. stmt_list → stmt stmt_list
3. | ε
4. stmt → id := expr
5. | read id
6. | write expr
7. expr → term term_tail
8. term_tail → add op term term_tail | ε
9. term → factor fact_tail
10. fact_tail → mult_op fact fact_tail | ε
12. factor → ( expr ) | id | number
13. add_op → + | -
14. mult_op → * | /
© Sam Siewert 16
Compare LL expression
rules to LR expression
rules on P. 52 [50] – Which version
do you find more intuitive?
expr -> term | expr add_op term
term -> factor | term mult_op
factor
factor -> id | number | - factor
| ( expr )
add_op -> + | -
mult_op -> * | /
Corresponding Example Programread A
read B
sum := A + B
write sum
write sum / 2
© Sam Siewert 17
Automation ClassesCombinational Logic – And, Or, Not– Use to compose more complex logic
– XOR, 1’s Compliment, 2’s Compliment, Mux & Demux, etc.
– Feed forward binary inputs, output binary
– Latch inputs and outputs
Finite State machine – clocked or event driven logic states and transitions– State held with Flip-flops [e.g. JK, SR]
– Simple processing and control
– Discrete, deterministic or non-deterministic (more than one transition into or out of for same input)
PDA = Stack + state machine
Turing Machine (general limit of computation)
© Sam Siewert 18
P
0 | 1
Q1
NFA
P
0
Q
1
DFA
P,
Q
0
1
e
Odd Binary
String Input
LL and LR Parser are PDAs
a PDA can be specified with a state diagram and
a stack– We need stack for symbol memory
LL is a PDA with One State and Accept with push/pop
LALR and LR is PDA with Multiple States
– Builds Parse Tree From the Bottom Up
– Recognizer
Simple Languages like C are Typically LALR, Using
Look-Ahead Feature with LR
© Sam Siewert 19
Example from Book (LR Parsing, P. 91 [88])
1. program → stmt list $$
2. stmt_list → stmt_list stmt
3. stmt_list → stmt
4. stmt → id := expr
5. | read id
6. | write expr
7. expr → term
8. | expr add op term
9. term → factor
10. | term mult_op factor
11. factor → ( expr ) | id | number
12. add op → + | -
13. mult op → * | /
© Sam Siewert 20
Compare LR to LL (P. 74 [72] for
LL and P. 91 [88] for LR) for
Calculator Language. For LR
Grammars, Shift and Reduce
Table Driven Parsers can be Built
– Roots of Partially Completed Sub-trees are Kept and Matched
Names and Binding (Scope)
Not Reserved Words (Programmer Symbol)
#include <stdio.h>
int x=0;
void foo(void);
void main(void)
{
printf("x in main before local declaration = %d\n", x);
int x=1;
printf("x in main after local declaration = %d\n", x);
foo();
}
void foo(void)
{
x=2;
printf("x in foo = %d\n", x);
}
© Sam Siewert 21
Time of Binding
Programming
Compile (By Name, By Type Signature – Overloading)
Linking
Load and Run
Static – Before you Run (Static Link Libraries)
Dynamic – While you Run (Dynamic Link Libraries)
Static (file global)
Stack (function local or parameter)
Heap (malloc)
Extern (global to program)
Stack Frame (Remember EABI in Assembly?)
© Sam Siewert 22
Instantiation
Macros (C, C++) – Preprocessor, before compile
Ada Generics and C++ Templates
– Same Basic Algorithm, Instantiated Multiple Times (Copies) that
Apply Algorithm to Different Type
Far Different than Late Binding
– OO Method To Call Method Determined at Run Time
– Based on Inheritance
– Methods Defined
– Current Instantiation of Object for Class
– Over-rides
– Virtual Functions
– Pure Virtual Functions
© Sam Siewert 23
C++ Examples
Compile with gcc or g++ to a.out and run
– http://mercury.pr.erau.edu/~siewerts/cs332/code/cs332_code/ch
3/
g++ shifty.cpp – What Scoping Rules are Used Here?
g++ objex.cpp – What Scoping and Binding Features?
© Sam Siewert 24
Limiting Scope
Point of Many Files => One Program
Use Static in C / C++ To Limit Scope of Globals
Don’t Use Globals
Beware of Heap Allocation and Pointers
© Sam Siewert 25