Lexical and Syntax Analysis Chapter 4
description
Transcript of Lexical and Syntax Analysis Chapter 4
![Page 1: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/1.jpg)
Lexical and Syntax Analysis
Chapter 4
![Page 2: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/2.jpg)
Compilation
• Translating from high-level language to machine code is organized into several phases or passes.
• In the early days passes communicated through files, but this is no longer necessary.
![Page 3: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/3.jpg)
Language Specification
• We must first describe the language in question by giving its specification.• Syntax:
• Defines symbols (vocabulary)• Defines programs (sentences)
• Semantics: • Gives meaning to sentences.
• The formal specifications are often the input to tools that build translators automatically.
![Page 4: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/4.jpg)
Compiler passes
Optimizer
Lexical Analyzer
Parser
Semantic Analyzer
Translator
Final Assembly
Optimizer
String of tokens
String of characters
Abstract syntax tree
Low-level intermediate code
Abstract syntax tree
Low-level intermediate code
Executable/object code
Translator
Medium-level intermediate code
Low-level intermediate code
Medium-level intermediate code
Source-to-sourceoptimizer
Abstract syntax tree
Abs
trac
t sy
ntax
tre
e
![Page 5: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/5.jpg)
Compiler passes
Parser
semantic analyzer
Optimizer
Final assembly
Translatorsymbol table
managererror handler
target program
source programfront end
back end
Lexical scanner
![Page 6: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/6.jpg)
Lexical analyzer
• Also called a scanner or tokenizer• Converts stream of characters into a stream of tokens
• Tokens are:• Keywords such as for, while, and class.• Special characters such as +, -, (, and <• Variable name occurrences• Constant occurrences such as 1, 0, true.
![Page 7: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/7.jpg)
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of characters
Sequence of tokens
Parser Sequence of tokens
Parse tree
![Page 8: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/8.jpg)
Lexical analyzer
• The lexical analyzer is usually a subroutine of the parser.
• Each token is a single entity. A numerical code is usually assigned to each type of token.
![Page 9: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/9.jpg)
• Lexical analyzers perform:• Line reconstruction
• delete comments• delete white spaces• perform text substitution
• Lexical translation: translation of lexemes -> tokens• Often additional information is affiliated with a
token.
Lexical analyzer
![Page 10: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/10.jpg)
Parser
• Performs syntax analysis• Imposes syntactic structure on a sentence.• Parse trees are used to expose the structure.
• These trees are often not explicitly built• Simpler representations of them are often used
• Parsers, accepts a string of tokens and builds a parse tree representing the program
![Page 11: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/11.jpg)
Parser
• The collection of all the programs in a given language is usually specified using a list of rules known as a context free grammar.
![Page 12: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/12.jpg)
A grammar has four components: A set of tokens known as terminal symbols A set of variables or non-terminals A set of productions where each production
consists of a non-terminal, an arrow, and a sequence of tokens and/or non-terminals
A designation of one of the nonterminals as the start symbol.
Parser
![Page 13: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/13.jpg)
Symbol Table Management
• The symbol table is a data structure used by all phases of the compiler to keep track of user defined symbols and keywords.
• During early phases (lexical and syntax analysis) symbols are discovered and put into the symbol table
• During later phases symbols are looked up to validate their usage.
![Page 14: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/14.jpg)
Symbol Table Management
• Typical symbol table activities:
• add a new name• add information for a name• access information for a name• determine if a name is present in the table• remove a name• revert to a previous usage for a name (close a
scope).
![Page 15: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/15.jpg)
Symbol Table Management
• Many possible Implementations:
• linear list• sorted list• hash table• tree structure
![Page 16: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/16.jpg)
Symbol Table Management
• Typical information fields:• print value• kind (e.g. reserved, typeid, varid, funcid, etc.)• block number/level number• type• initial value• base address• etc.
![Page 17: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/17.jpg)
Abstract Syntax Tree
• The parse tree is used to recognize the components of the program and to check that the syntax is correct.
• As the parser applies productions, it usually generates the component of a simpler tree (known as Abstract Syntax Tree).
• The meaning of the component is derived out of the way the statement is organized in a subtree.
![Page 18: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/18.jpg)
Semantic Analyzer
• The semantic analyzer completes the symbol table with information on the characteristics of each identifier.
• The symbol table is usually initialized during parsing.
• One entry is created for each identifier and constant.• Scope is taken into account. Two different variables with
the same name will have different entries in the symbol table.
• The semantic analyzer completes the table using information from declarations.
![Page 19: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/19.jpg)
Semantic Analyzer
• The semantic analyzer does• Type checking• Flow of control checks• Uniqueness checks (identifiers, case labels, etc.)
• One objective is to identify semantic errors statically. For example:
• Undeclared identifiers• Unreachable statements • Identifiers used in the wrong context.• Methods called with the wrong number of
parameters or with parameters of the wrong type.
![Page 20: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/20.jpg)
Semantic Analyzer
• Some semantic errors have to be detected at run time. The reason is that the information may not be available at compile time.
• Array subscript is out of bonds.• Variables are not initialized.• Divide by zero.
![Page 21: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/21.jpg)
Error Management
• Errors can occur at all phases in the compiler
• Invalid input characters, syntax errors, semantic errors, etc.
• Good compilers will attempt to recover from errors and continue.
![Page 22: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/22.jpg)
Translator
• The lexical scanner, parser, and semantic analyzer are collectively known as the front end of the compiler.
• The second part, or back end starts by generating low level code from the (possibly optimized) AST.
![Page 23: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/23.jpg)
• Rather than generate code for a specific architecture, most compilers generate intermediate language
• Three address code is popular.• Really a flattened tree representation.• Simple.• Flexible (captures the essence of many target
architectures).• Can be interpreted.
Translator
![Page 24: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/24.jpg)
• One way of performing intermediate code generation:
• Attach meaning to each node of the AST.• The meaning of the sentence = the “meaning”
attached to the root of the tree.
Translator
![Page 25: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/25.jpg)
XIL
• An example of Medium level intermediate language is XIL. XIL is used by IBM to compile FORTRAN, C, C++, and Pascal for RS/6000.
• Compilers for Fortran 90 and C++ have been developed using XIL for other machines such as Intel 386, Sparc, and S/370.
![Page 26: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/26.jpg)
Optimizers
• Intermediate code is examined and improved.• Can be simple:
• changing “a:=a+1” to “increment a”• changing “3*5” to “15”
• Can be complicated:• reorganizing data and data accesses for cache
efficiency
• Optimization can improve running time by orders of magnitude, often also decreasing program size.
![Page 27: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/27.jpg)
Code Generation
• Generation of “real executable code” for a particular target machine.
• It is completed by the Final Assembly phase
• Final output can either be • assembly language for the target machine• object code ready for linking
• The “target machine” can be a virtual machine (such as the Java Virtual Machine, JVM), and the “real executable code” is “virtual code” (such as Java Bytecode).
![Page 28: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/28.jpg)
Compiler Overview
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Code Optimizer
Code Generation
Source Program IF (a<b) THEN c=1*d;
Token Sequence
Syntax Tree
3-Address Code
Optimized 3-Addr. Code
Assembly Code
IF (ID“a”
<ID“b”
THENID“c”
=CONST
“1” *ID“d”
IF_stmt
<a
b
cond_expr
listassign_stmt
c
*
lhs
rhs 1
dGE a, b, L1MUlT 1, d, cL1:
GE a, b, L1MOV d, cL1:
loadi R1,acmpi R1,bjge L1loadi R1,dstorei R1,cL1:
![Page 29: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/29.jpg)
Lexical Analysis
![Page 30: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/30.jpg)
What is Lexical Analysis?
- The lexical analyzer deals with small-scale language constructs, such as names and numeric literals. The syntax analyzer deals with the large-scale constructs, such as expressions, statements, and program units.
- The syntax analysis portion consists of two parts:
1. A low-level part called a lexical analyzer (essentially a pattern matcher).
2. A high-level part called a syntax analyzer, or parser.
The lexical analyzer collects characters into logical groupings and assigns internal codes to the groupings according to their structure.
![Page 31: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/31.jpg)
Lexical Analyzer in Perspective
lexical analyzer parser
symbol table
source program
token
get next token
![Page 32: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/32.jpg)
Lexical Analyzer in Perspective
• LEXICAL ANALYZER
• Scan Input
• Remove white space, …
• Identify Tokens
• Create Symbol Table
• Insert Tokens into AST
• Generate Errors
• Send Tokens to Parser
• PARSER
• Perform Syntax Analysis
• Actions Dictated by Token Order
• Update Symbol Table Entries
• Create Abstract Rep. of Source
• Generate Errors
![Page 33: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/33.jpg)
Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens.
Sum = oldsum – value /100;
Token Lexeme
IDENT sum
ASSIGN_OP =
IDENT oldsum
SUBTRACT_OP -
IDENT value
DIVISION_OP /
INT_LIT 100
SEMICOLON ;
![Page 34: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/34.jpg)
Basic Terminology
• What are Major Terms for Lexical Analysis?• TOKEN
• A classification for a common set of strings• Examples Include <Identifier>, <number>, etc.
• PATTERN• The rules which characterize the set of strings
for a token• LEXEME
• Actual sequence of characters that matches pattern and is classified by a token
• Identifiers: x, count, name, etc…
![Page 35: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/35.jpg)
Basic Terminology
Token Sample Lexemes Informal Description of Pattern
const
if
relation
id
num
literal
const
if
<, <=, =, < >, >, >=
pi, count, D2
3.1416, 0, 6.02E23
“core dumped”
const
if
< or <= or = or < > or >= or >
letter followed by letters and digits
any numeric constant
any characters between “ and “ except “
Classifies Pattern
Actual values are critical. Info is :
1. Stored in symbol table2. Returned to parser
![Page 36: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/36.jpg)
Token Definitions
Suppose: S ts the string banana
Prefix : ban, banana
Suffix : ana, banana
Substring : nan, ban, ana, banana
Subsequence: bnan, nn
![Page 37: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/37.jpg)
letter A | B | C | … | Z | a | b | … | z
digit 0 | 1 | 2 | … | 9
id letter ( letter | digit )*
Shorthand Notation:
“+” : one or more r* = r+ | & r+ = r r*
“?” : zero or one r?=r | [range] : set range of characters (replaces “|” )
[A-Z] = A | B | C | … | Z
id [A-Za-z][A-Za-z0-9]*
Token Definitions
![Page 38: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/38.jpg)
Token Recognition
Assume Following Tokens:
if, then, else, re-loop, id, num
What language construct are they used for ?
Given Tokens, What are Patterns ?
if if
then then
else else
Re-loop < | <= | > | >= | = | <>
id letter ( letter | digit )*
num digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?
What does this represent ?
Grammar:stmt |if expr then stmt
|if expr then stmt else stmt|
expr term re-loop term | termterm id | num
![Page 39: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/39.jpg)
What Else Does Lexical Analyzer Do?
Scan away b, nl, tabs
Can we Define Tokens For These?
blank b
tab ^T
newline ^M
delim blank | tab | newline
ws delim +
![Page 40: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/40.jpg)
Symbol Tables
Regular Expression
Token Attribute-Value
ws
ifthenelse
idnum
<<==
< >>
>=
-
ifthenelseid
numreloprelop reloprelopreloprelop
-
---
pointer to table entrypointer to table entry
LTLEEQNEGTGE
Note: Each token has a unique token identifier to define category of lexemes
![Page 41: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/41.jpg)
Building a Lexical Analyzer
There are three approaches to building a lexical analyzer:
1. Write a formal description of the token patterns of the language using a descriptive language. Tool on UNIX system called lex
2. Design a state transition diagram that describes the token patterns of the language and write a program that implements the diagram.
3. Design a state transition diagram and hand-construct a table-driven implementation of the state diagram.
![Page 42: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/42.jpg)
Diagrams for Tokens
• Transition Diagrams (TD) are used to represent the tokens
• Each Transition Diagram has:
• States : Represented by Circles
• Actions : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)
• Deterministic - No need to choose between 2 different actions
![Page 43: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/43.jpg)
1912 1413 1615 1817start otherdigit . digit E + | - digit
digit
digit
digit
E
digit
*
start digit25
other2726
digit
*
start digit20
* .21
digit
24other
23
digit
digit22
*
Example : Transition Diagrams
![Page 44: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/44.jpg)
State diagram to recognize names, reserved words, and integer literals
![Page 45: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/45.jpg)
Reasons to use BNF to Describe Syntax
Provides a clear syntax description The parser can be based directly on the BNF
Parsers based on BNF are easy to maintain
![Page 46: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/46.jpg)
Reasons to Separate Lexical and Syntax Analysis
Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser
Efficiency - separation allows optimization of the lexical analyzer
Portability - parts of the lexical analyzer may not be portable, but the parser always is portable
![Page 47: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/47.jpg)
Summary of Lexical Analysis
• A lexical analyzer is a pattern matcher for character strings
• A lexical analyzer is a “front-end” for the parser
• Identifies substrings of the source program that belong together - lexemes
• Lexemes match a character pattern, which is associated with a lexical category called a token
- sum is a lexeme; its token may be IDENT
![Page 48: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/48.jpg)
Semantic AnalysisIntro to Type Checking
![Page 49: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/49.jpg)
The Compiler So Far
• Lexical analysis• Detects inputs with illegal tokens
• Parsing• Detects inputs with ill-formed parse trees
• Semantic analysis• The last “front end” phase• Catches more errors
![Page 50: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/50.jpg)
What’s Wrong?
• Example 1
int in x;
• Example 2
int i = 12.34;
![Page 51: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/51.jpg)
Why a Separate Semantic Analysis?
• Parsing cannot catch some errors
• Some language constructs are not context-free
• Example: All used variables must have been declared (i.e. scoping)
• Example: A method must be invoked with arguments of proper type (i.e. typing)
![Page 52: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/52.jpg)
What Does Semantic Analysis Do?
• Checks of many kinds:
1. All identifiers are declared2. Types 3. Inheritance relationships4. Classes defined only once5. Methods in a class defined only once6. Reserved identifiers are not misusedAnd others . . .
• The requirements depend on the language
![Page 53: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/53.jpg)
Scope
• Matching identifier declarations with uses
• Important semantic analysis step in most languages
![Page 54: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/54.jpg)
Scope (Cont.)
• The scope of an identifier is the portion of a program in which that identifier is accessible
• The same identifier may refer to different things in different parts of the program• Different scopes for same name don’t overlap
• An identifier may have restricted scope
![Page 55: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/55.jpg)
Static vs. Dynamic Scope
• Most languages have static scope• Scope depends only on the program text, not run-
time behavior• C has static scope
• A few languages are dynamically scoped• Lisp, COBOL• Current Lisp has changed to mostly static scoping• Scope depends on execution of the program
![Page 56: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/56.jpg)
Class Definitions
• Class names can be used before being defined• We can’t check this property
• using a symbol table• or even in one pass
• Solution• Pass 1: Gather all class names• Pass 2: Do the checking
• Semantic analysis requires multiple passes• Probably more than two
![Page 57: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/57.jpg)
Types
• What is a type?• The notion varies from language to language
• Consensus• A set of values• A set of operations on those values
• Classes are one instantiation of the modern notion of type
![Page 58: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/58.jpg)
Why Do We Need Type Systems?
Consider the assembly language fragment
addi $r1, $r2, $r3
What are the types of $r1, $r2, $r3?
![Page 59: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/59.jpg)
Types and Operations
• Certain operations are legal for values of each type
• It doesn’t make sense to add a function pointer and an integer in C
• It does make sense to add two integers
• But both have the same assembly language implementation!
![Page 60: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/60.jpg)
Type Systems
• A language’s type system specifies which operations are valid for which types
• The goal of type checking is to ensure that operations are used with the correct types• Enforces intended interpretation of values, because
nothing else will!
• Type systems provide a concise formalization of the semantic checking rules
![Page 61: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/61.jpg)
What Can Types do For Us?
• Can detect certain kinds of errors
• Memory errors:• Reading from an invalid pointer, etc.
• Violation of abstraction boundaries:
class FileSystem {
open(x : String) : File {
…
}
…
}
class Client { f(fs : FileSystem) { File fdesc <- fs.open(“foo”) … } -- f cannot see inside fdesc !}
![Page 62: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/62.jpg)
Type Checking Overview
• Three kinds of languages:
• Statically typed: All or almost all checking of types is done as part of compilation (C and Java)
• Dynamically typed: Almost all checking of types is done as part of program execution (Scheme)
• Untyped: No type checking (machine code)
![Page 63: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/63.jpg)
The Type Wars
• Competing views on static vs. dynamic typing• Static typing proponents say:
• Static checking catches many programming errors at compile time
• Avoids overhead of runtime type checks
• Dynamic typing proponents say:• Static type systems are restrictive• Rapid prototyping easier in a dynamic type system
![Page 64: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/64.jpg)
The Type Wars (Cont.)
• In practice, most code is written in statically typed languages with an “escape” mechanism• Unsafe casts in C, Java
• It’s debatable whether this compromise represents the best or worst of both worlds
![Page 65: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/65.jpg)
Type Checking and Type Inference
• Type Checking is the process of verifying fully typed programs
• Type Inference is the process of filling in missing type information
• The two are different, but are often used interchangeably
![Page 66: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/66.jpg)
Rules of Inference
• We have seen two examples of formal notation specifying parts of a compiler
• Regular expressions (for the lexer)• Context-free grammars (for the parser)
• The appropriate formalism for type checking is logical rules of inference
![Page 67: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/67.jpg)
Why Rules of Inference?
• Inference rules have the formIf Hypothesis is true, then Conclusion is true
• Type checking computes via reasoning
If E1 and E2 have certain types, then E3 has a certain type
• Rules of inference are a compact notation for “If-Then” statements
![Page 68: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/68.jpg)
From English to an Inference Rule
• The notation is easy to read (with practice)
• Start with a simplified system and gradually add features
• Building blocks• Symbol is “and”• Symbol is “if-then”• x:T is “x has type T”
![Page 69: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/69.jpg)
From English to an Inference Rule (2)
If e1 has type Int and e2 has type Int, then e1 + e2 has type Int
(e1 has type Int e2 has type Int) e1 + e2 has type Int
(e1: Int e2: Int) e1 + e2: Int
![Page 70: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/70.jpg)
From English to an Inference Rule (3)
The statement
(e1: Int e2: Int) e1 + e2: Int
is a special case of
( Hypothesis1 . . . Hypothesisn ) Conclusion
This is an inference rule
![Page 71: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/71.jpg)
Notation for Inference Rules
• By tradition inference rules are written
• Type rules can also have hypotheses and conclusions of the form:
` e : T• ` means “it is provable that . . .”
` Hypothesis1 … ` Hypothesisn
` Conclusion
![Page 72: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/72.jpg)
Two Rules
i is an integer
` i : Int [Int]
` e1 : Int
` e2 : Int
` e1 + e2 : Int[Add]
![Page 73: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/73.jpg)
Two Rules (Cont.)
• These rules give templates describing how to type integers and + expressions
• By filling in the templates, we can produce complete typings for expressions
![Page 74: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/74.jpg)
Example: 1 + 2
1 is an integer 2 is an integer
` 1 : Int ` 2 : Int
` 1 + 2 : Int
![Page 75: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/75.jpg)
Soundness
• A type system is sound if• Whenever ` e : T • Then e evaluates to a value of type T
• We only want sound rules• But some sound rules are better than others:
i is an integer
` i : Object
![Page 76: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/76.jpg)
Type Checking Proofs
• Type checking proves facts e : T• Proof is on the structure of the AST• Proof has the shape of the AST• One type rule is used for each kind of AST node
• In the type rule used for a node e:• Hypotheses are the proofs of types of e’s sub-
expressions• Conclusion is the proof of type of e
• Types are computed in a bottom-up pass over the AST
![Page 77: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/77.jpg)
Rules for Constants
` false : Bool [Bool]
s is a string constant
` s : String[String]
![Page 78: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/78.jpg)
Two More Rules
` e : Bool
` not e : Bool [Not]
` e1 : Bool
` e2 : T
` while e1 loop e2 pool : Object
[Loop]
![Page 79: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/79.jpg)
A Problem
• What is the type of a variable reference?
• The local, structural rule does not carry enough information to give x a type.
x is an identifier
` x : ? [Var]
![Page 80: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/80.jpg)
Notes
• The type environment gives types to the free identifiers in the current scope
• The type environment is passed down the AST from the root towards the leaves
• Types are computed up the AST from the leaves towards the root
![Page 81: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/81.jpg)
Expressiveness of Static Type Systems
• A static type system enables a compiler to detect many common programming errors
• The cost is that some correct programs are disallowed• Some argue for dynamic type checking instead• Others argue for more expressive static type
checking
• But more expressive type systems are also more complex
![Page 82: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/82.jpg)
Dynamic And Static Types
• The dynamic type of an object is the class C that is used in the “new C” expression that creates the object• A run-time notion• Even languages that are not statically typed have the
notion of dynamic type
• The static type of an expression is a notation that captures all possible dynamic types the expression could take• A compile-time notion
![Page 83: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/83.jpg)
• The typing rules use very concise notation• They are very carefully constructed• Virtually any change in a rule either:
• Makes the type system unsound (bad programs are accepted as well typed)
• Or, makes the type system less usable(perfectly good programs are rejected)
• But some good programs will be rejected anyway • The notion of a good program is undecidable
Dynamic And Static Types
![Page 84: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/84.jpg)
Type Systems
• Type rules are defined on the structure of expressions
• Types of variables are modeled by an environment
• Types are a play between flexibility and safety
![Page 85: Lexical and Syntax Analysis Chapter 4](https://reader035.fdocuments.us/reader035/viewer/2022062305/56814e99550346895dbc4252/html5/thumbnails/85.jpg)
End of Lecture 6