Lecture 11: Semantic Analysis: Types & Type Checkingcse450/Lectures/11-semantic-analysis.pdfA...
Transcript of Lecture 11: Semantic Analysis: Types & Type Checkingcse450/Lectures/11-semantic-analysis.pdfA...
CSE450 Translation of Programming LanguagesLecture 11: Semantic Analysis: Types & Type Checking
Semantic AnalyzerSemantic Analyzer
Syntax Analyzer
Lexical Analyzer
Target Code Generator
Code Optimizer
Int. Code Generator
Structure of a
Compiler
Source Language
Target Language
Front End
Back End
Intermediate Code
Today!
Project 1 - ✔Project 2 - ✔
Project 3 - ✔
Parsing cannot catch all possible errors. Parsing assumes that we are working with a context-free grammar.
Example language constructs that require context:
Have variables been declared? Is a variable available in the current scope? Are the operands of an expression valid types? Is an assignment using legal types? Are the arguments to a function of the correct type?
Importance of Semantic Analysis
Why do we need to worry about type checking?
Consider the Tube-IC fragment:
add s12 s20 s34What types are s12, s20, and s34?
They can be anything! Likewise, processors treat registers generically. This makes their operations flexible and reusable, but not type safe.
Types
Legal operations can vary depending on the type of a value.
It typically does not make sense to add a function pointer to an integer in C++ It does makes sense to add integers
Both of these operations can potentially have the same implementation in assembly. As far as the processor is concerned, an integer and a pointer look the same.
Types and Operations
A language type system specifies which types are available, and what operations can be used on those types.
The goal of type checking is to ensure that only "sensible" operations are allowed to be performed.
Type checking also can provide the ability to have different operations performed depending on the types involved.
Type Systems
Statically Typed Almost all type checking happens at compile time Each variable is limited to a single type Language examples include C/C++, Java, Tubular
Dynamically Typed Almost all type checking occurs at runtime Variables can typically contain any type of value Most scripting languages do this (Javascript, Python, Ruby, Scheme, etc.)
Untyped No checking is done, such as in assembly
Three basic kinds of Type Checking
There are three basic kinds of type checking systems:
Static typing Many errors can be caught at compile time Optimizations can be easier to perform Runtime environment can be faster, type decisions have already been performed
Dynamically Typed Less restrictive, easier to express operations, faster development Programs can be more modular, extensible, and adaptive More runtime machinery required, can be slower during execution
Static vs. Dynamic Typing In practice, most languages use some statically typed and dynamically typed elements.
Provide escape mechanisms (casting) to allow static elements to be used as needed.
We will have two basic types in Project 4: val - floating point quantity (already implemented)char - a single ascii char
And one meta type will be added for Project 5: array
A consecutive grouping of a basic type array(char) can also be referred to as string
Types in Tubular
For Tubular, we will be using static typing. Simpler to implement the runtime environment.
Four basic scenarios where types will need to be checked:
Variable Assignments: type of RHS must match variable Mathematical Operations: type must be val for + - * / % && || and ! Comparison Operators: types must both be val or both be char Generic commands, like print: any type accepted Function calls (coming in Project 7): arguments must match
Tubular Type Checking
Variable Assignments
val x;char y;
x = 1;y = 'b';
x = 'a';y = 2;
assignment: var_any '=' expression
= = x 1 y 'b'
✔ ✔
✖ ✖
= = x 'a' y 2
Mathematical Operationsval x = 1;char y;
x = x + 2;y = 'c';
x = y + 3;
y = x;
y = 'a' + 'b';
expr: expr '+' expr + x 2
✔ ✔
✖
✖
✖
+ y 3
+ 'a' 'b'
The print command can take anything. Type information is used to determine what operation to perform.
If the type of the argument is val, use out_val If the type of the argument is char, use out_char Starting with Project 5, If the type of the argument is an array print out each element of that array with the internal type
Other commands and functions may have particular type requirements, depending on argument position.
Functions and Commands
Type CheckingChar versus Val
The ‘char’ Type
Like ‘val’ variables can be declared type ‘char’.
Char variables are single characters between single quotes.
The symbol table must keep track of type to ensure that no illegal operations take place.
val x = 0;
char y = ‘a’;
Escape Characters
The 4 escape characters are preceded by a backslash.
No other escape characters should be implemented.
char a = ‘\n’;
char b = ‘\t’;
char c = ‘\’’;
char d = ‘\\’;
Special Note - # is a normal character
The comment character ‘#’ is allowed between single quotes and doesn’t denote a comment.
char a = ‘#’;
Type Checking - Assignment
You cannot assign a variable of one type to another.
With static type checking, we know the type of every variable at compile time and can ensure correctness.
char a = ‘x’;
val b;
b = a; # ERROR
Type Checking - Relationship Operators
You can compare (==, !=, >, >=, <, <=) two variables of the same type.
But you cannot compare two different types.
char a = ‘x’;
char b = ‘y’;
b > a;
val c = 0;
a != c; # ERROR
Type Checking - Mathematical Operators
The char type cannot be used by math operators (+, +=, -, -=, *, *=, /, /=),
nor boolean operators (&&, ||, !).
char a = ‘x’;
char b = ‘y’;
a + b; # ERROR
a && b; # ERROR
Type Checking - Boolean Evaluation
The char type cannot be used where a boolean result is needed (conditions for if and while statements).
char a = ‘x’;
if (a) { # ERROR
a = ‘b’;
}
Type Checking - Type Specific Commands
The random command only takes the type ‘val’, giving it anything else is an error.
The print command happily takes type ‘char’ as an argument.
char a = ‘x’;
random(a); # ERROR
print(a);
Hold up the colors that are legal. #1val x = 1;x = ‘a’;
#2char x = ‘a’;char y = ‘b’;char z = x + y;
#3char x = ‘a’;char y = ‘b’;x != y;
#4char x = ‘a’;if (x == ‘b’) {x = ‘b’;
}
‘Char’ Implementation - val_copy
Tube Intermediate Code handles ‘char’s just like ‘val’s.
Escape characters are treated identically to Tubular (original source).
char a = ‘\n’;becomes
val_copy ‘\n’ s1
val_copy s1 s2
‘Char’ Implementation - other ops
The other TubeIC operators behave with char like val.
‘a’ > ‘b’;becomes
val_copy ‘a’ s1val_copy ‘b’ s2test_gtr s1 s2 s3
‘Char’ Implementation - out_char
You’ve already been using the one char specific TubeIC instruction.
print(1, ‘a’)becomes
val_copy 1 s1val_copy ‘a’ s2out_val s1out_char s2out_char ‘\n’
How to keep track of TYPE
Every variable (temporary or named) needs to know its type.
You can use the symbol table to store this information.
For this class, there will only be a finite number of types (val, char, and a few others introduced in future projects).
Implementing ‘char’ type1. Make the lexer include escape characters
2. Make the parser allow type ‘char’ in variable declarations
3. Make the symbol table store type of every variable used
4. Make the abstract syntax tree include a node for literal char values
5. For each node in the AST, make sure that the types of its children are legal or raise an error if not. This can be done at the creation of the node.
Scope RefresherSymbol Tables and Decrementing Scope
Scoping can be implemented right within your symbol table(s).
When a variable is declared: Check that it has not been previously defined within this scope (but lower scopes are allowed) Add it to the table, recording its name, type, etc., along with the scope in which it was created.
When leaving a scope, simply deactivate symbols that are no longer accessible. They can’t be used again in the source program. (But you will need to reference them when outputting your intermediate code!)
Implementing Scoping
Stack of SymbolTablesSymbolTable[0]: val aval b
Given: val a = 123;val b = 44;if (a == 123) {char a = 'x';print(a);
}print(a);
SymbolTable[1]: char a