Automatic compilation Student name: Eldad Uzman Student ID : 036544062 Lecturer : DR Itzhak Aviv.
-
Upload
everett-wells -
Category
Documents
-
view
215 -
download
0
Transcript of Automatic compilation Student name: Eldad Uzman Student ID : 036544062 Lecturer : DR Itzhak Aviv.
Automatic compilation
Student name: Eldad UzmanStudent ID : 036544062
Lecturer : DR Itzhak Aviv
Introduction to compilation A compiler is a program that translates one programminglanguage to another. Its input is the source language; its output is the target language.The most known form of compiling is to read a program inone programming language, namely: C, C++, JAVA, C# and VB, and to translate it to an equivalent Assembly code or binary code for the machine to execute.Far more than just writing a program, these tools are needed in many other fields of software engineering.
Why compilation tools are needed?
A common mistake among many programmers is the
attitude that if your code is being compiled
successfully, release it.
It is clear that more complex software, that involves
many external tools and have a rough need for
integration, there’s a need for more than just to compile
the code.
System engineer life cycleSystem concept
System design
Functional specifications
Component design
Program and unit tests
Integration and system tests
Conversion and installation
Operation and maintenance
Software release life cycle
Release Management is the relatively new but rapidly Release Management is the relatively new but rapidly growing discipline within software engineering of managing growing discipline within software engineering of managing software releasessoftware releases..
As software systems, software development process, and As software systems, software development process, and resources become more distributed, they invariably resources become more distributed, they invariably become more specialized and complex. Furthermore, become more specialized and complex. Furthermore, software products are typically in an ongoing cycle of software products are typically in an ongoing cycle of development, testing, and release. Add to this an evolution development, testing, and release. Add to this an evolution and growing complexity of the platforms on which these and growing complexity of the platforms on which these systems run, and it becomes clear there are a lot of systems run, and it becomes clear there are a lot of moving pieces that must fit together seamlessly to moving pieces that must fit together seamlessly to guarantee the success and long-term value of a product or guarantee the success and long-term value of a product or projectproject..
Release management
My final project
My engine
version
JS files
C++ codeSoftware binary code
I don’t want to compile ALL the code, but just some fragments of it.
I need an external tool that will allow me to chose the exact fragments that are necessary for the specific version.
How does a compiler works?There are countless source languages and target languages and there are many kinds of compilers as well.However, despite of this apparent complexity, the fundamental tasks that any compiler must implement are virtually the same.There are two parts to compilation:1) Analysis – breaking up the source program into
constitute pieces and creates an intermediate representation (IR) of the source program.
2) Synthesis – construction of the desired target code from the IR.
Analysis
In order to break up the structure and to understand the
meaning of the program, the compiler will execute the
three phases in the analysis stage:
1) Lexical analysis – breaking the input to words or tokens.
2) Syntax analysis – parsing the phrase structure of the program
3) Semantic analysis – calculating the meaning of the program.
Intermediate representation (IR)
Intermediate representation is a data structure that is
constructed from the data to a program and from
which parts of the output data of the program is
constructed in turn.
Lexical analysisAlso called linear analysis or scanning.
In this phase we will break the source code to tokens , place
them on a tree with a left to right reading process.
Example:
position = initial +rate*60;
Would be grouped as:
1) The identifier position.
2) The assignment symbol.
3) The identifier initial
4) The plus sign
5) The identifier rate.
6) Times sign
7) The number 60
Syntax analysisAlso called hierarchical analysis or parsing.
In this phase we will group the tokens into grammatical
phrases into a syntax tree.
Note:
The division between syntax analysis and lexical analysis is
debatable.
typedef char* string;typedef struct A_stm_ *A_stm;typedef struct A_exp_ * A_exp;typedef struct A_expList_ *A_expList;typedef enum {A_plus, A_minus, A_times, A_div} A_binop;
//statement structstruct A_stm_ {enum {A_compoundStm, A_assignStm, A_printStm }kind;union{struct {A_stm stm1, stm2;}compound;
struct {string id, A_exp exp;}assign; struct {A_expList exps;} print;
{ u;;{
//statement types constructorsA_stm A_CompoundStm(A_stm stm1, A_stm stm2)}A_stm s = malloc(sizeof(*s));if(s ==null)}stderr("error allocating memory");exit(0);
{ s->kind = A_compoundStm;s->u.compound.stm1 = stm1; s->u.compound.stm2 = stm2;return s;
{
A_stm A_AssignStm(string id, A_exp exp){A_stm s = malloc(sizeof(*s));
if(s ==null){stderr("error allocating memory");
exit(0); }
s->kind = A_assignStm;s->u.assign.id = id; s->u.assign.exp = exp;return s;}
A_stm A_PrintStm ( A_expList exps){A_stm s = malloc(sizeof(*s));
if(s ==null){stderr("error allocating memory");
exit(0); }
s->kind = A_printStm;s->u.print.exps = exps;return s;}
//Expression structstruct A_exp_ {enum { A_idExp, A_numExp, A_opExp, A_eseqExp }
kind;union { string id; int num; srtuct { A_exp left; A_binop
oper; A_exp right;}op struct { A_stm stm; A_exp
exp ; }eseq; } u;};
A_exp A_IdExp (string id){A_exp e = malloc(sizeof(*e));
if(e ==null){stderr("error allocating memory");
exit(0); }
e->kind = A_idExp;e->u.id = id;return e;}
//Expression constructorsA_exp A_numExp ( int num )}A_exp e = malloc(sizeof(*e));if(e ==null)}stderr("error allocating memory");exit(0);
{ e->kind = A_numExp;e->u.num = num;return e;
{
A_exp A_opExp ( A_stm stm , A_exp exp )}A_exp e = malloc(sizeof(*e));if(e ==null)}stderr("error allocating memory");exit(0);
{ e->kind = A_opExp;e->u.op.left = left;e->u.op.oper = oper;e->u.op.right = right;return e;
{
A_exp A_EseqExp ( A_exp left , A_binop oper , A_exp right )
}
A_exp e = malloc(sizeof(*e));
if(e ==null)}
stderr("error allocating memory");
exit(0);
{
e->kind = A_eseqExp;
e->u.eseq.stm = stm; e->u.eseq.exp = exp;e->u.op.right = right;
return e;
{
//experessions list struct
struct A_expList_ {enum { A_pairExpList, A_lastExpList } kind;
union { struct {A_exp head; A_expList tail;} pair;
A_exp last;
{u;
//expression list constructor
A_expList A_PairExpList (A_exp head, A_expList tail)
{
A_expList el = malloc (sizeof(el));
If(el == null){
stderr("error allocating memory");
exit(0);
}
el->kind = A_pairExpList;
el->u.pair.head = head;
el->u.pair.tail = tail;
return el;
}
A_expList A_LastExpList (A_exp last)
{
A_expList el = malloc (sizeof(el));
If(el == null){
stderr("error allocating memory");
exit(0);
}
el->kind = A_lastExpList;
el->u.last = last;
return el;
}
Syntax analysis exampleA_stm prog =
A_CompoundStm(A_AssignStm(“a”,
A_OpExp(A_NumExp(5),A_plus, A_numExp(3))),
A_CompoundStm(A_AssignStm(“b”,
A_EseqExp(A_printStm(A_PairExpList(A_IdExp(“a”),
A_LastExpList(A_OpExp(A_IdExp(“a”),A_minus,
A_NumExp(1))))),
A_OpExp(A_NumExp(10), A_times, A_IdExp(“a”)))),
A_printStm(A_LastExpList(A_IdExp(“b”)))));
syntax analysis treecompoundStm
assignStm
compoundStmopExpa
numExp
numExp
plus
35
assignStm
b eseqExp
printStmopExp
b
idExp
printStm
lastExpList
pairExpList
idExp
a
lastExpList
opExp
idExp numExpminus
1a
idExpnumExp times
a10
Semantic analysis
In this phase we will check the source program for semantic
errors and gather type information for the code generation
phase.
There are many checks included in the semantic analysis
phase but the most important one is the type checking.
Type checking verifies that each operator has the permitted
number of operands.
To do that, we need to write all our identifiers on a special
data structure called the symbol table.
Symbol tableThe basic data structure of the symbol table is the HASH
table, that allows us to find each key in a constant complexity
key1 key2 key3 key6key4 key5
value Hash function
value1
value2
struct bucket {string key; void* binding; struct bucket* next;};
#define SIZE 109
struct bucket *table[SIZE];
Unsigned int hash (char* s0){Unsigned int h =0 ; char *s;for(s=s0;*s;s++)
h=h*65599 +*sreturn h;}
struct bucket* Bucket(string key, void* binding, struct bucket* next){struct bucket* b= malloc(sizeof(*b));
if(b==null){stderr(“error allocating
memory”);exit(0);
}b->key = key; b->binding = binding; b->next = next;return b;}
void insert (string key, void *binding){int index = hash(key)%SIZE;Table[index] = Bucket(key,binding, table[index]);}
void * loockup(string key){int index = hash(key)%SIZE;struct bucket *b;for(b= table[index]; b; b= b->next)If(0==strcmp(b->key,key))return b->binding;return null;}
Implementation of hash table
Typedef struct S_symbol_ *S_symbol;Struct S_Symbol_ { string name, S_symbol next;};
Types moduletypedef struct TY_ty_ *TY_ty;typedef struct TY_tyList_ *TY_tyList;typedef struct TY_field_ *TY_field;typedef struct TY_fieldList_ *TY_fieldList;
struct TY_ty_ {enum{Ty_record, Ty_nil, Ty_int, Ty_string, ty_array, Ty_name, Ty_void} kind;union {TY_fieldList record; TY_ty array; struct{ S_symbol sym; Ty_ty ty;}name}name;} u;};
TY_ty TY_Nill(){TY_ty ty = malloc(sizeof(*ty));
if(ty == null){stderr(“error allocating memory”);
exit(0);}
Ty->kind = Ty_nil; return ty;}TY_ty TY_Int();TY_ty TY_String();TY_ty TY_Void();
struct TY_tyList_ { TY_ty head; TY_tyList tail};
TY_tyList TY_TyList(TY_ty head, TY_tyList tail){TY_tyList tyl = malloc(sizeof(*tyl));
if(tyl==null){ stderr(“error allocating memory”);
exit(0);}
tyl->head = head; tyl->tail = tail;return tyl;}
struct TY_field_ {S_symbol name; TY_ty ty};TY_field TY_Field(S_symbol name; TY_ty ty); {…}
struct TY_fieldList {TY_field head; TY_fieldList tail};TY_fieldList TY_FieldList (TY_field head, TY_fieldList tail);
Type checking
Now it’s all simple,
All we need to do is to make a left to right scan over the syntax tree
produced in the syntax analysis, and each time we found an
operators, we need to check the descending nodes of the current
operators.
compoundStm
assignStm
compoundStmopExpa
numExp
numExp
plus
35
assignStm
b eseqExp
printStmopExp
b
idExp
printStm
lastExpList
pairExpList
idExp
a
lastExpList
opExp
idExp numExpminus
1a
idExpnumExp times
a10
Now after the analysis is completed, we know the meaning of the source code, we know the code is correct, we can generate the intermediate code.
Intermediate code
typedef struct T_stm_ *T_stm;struct T_stm_ { enum {T_SEQ, T_LABEL, T_JUMP, T_CJUMP, T_MOVE, T_EXP} kind;
union{ struct {T_stm left , right} SEQ; Label LABEL; struct { T_exp dst; labelList labels } JUMP; struct { T_relOP op; T_exp left, right; Label true , false;} CJUMP; struct { T_exp dst , src;} MOVE; struct { T_exp exp; } T_EXP; } u; };
{T_stm Constructors…}typedef struct T_exp_ *T_exp;Struct T_exp_ {enum { T_BINOP, T_MEM, T_TEMP, T_ESEQ, T_NAME, T_CONST, T_CALL} kind;
union {struct {T_binOp op ; T_exp left, right;} BINOP; T_exp MEM; Temp_ TEMP; struct {T_stm stm ; T_exp exp;} ESEQ; Label NAME; int CONST; struct {T_exp exp; T_expList expList; } CALL; } u; };
{T_exp constructors…}
Intermediate code (cont)typedef struct T_expList_ *T_expList;struct T_expList_{T_exp head; T_expList tail;}T_expList T_ExpList (T_exp head, T_expList tail);
typedef struct T_stmList_ *T_stmList;struct T_stmList_{T_stm head; T_stmList tail;}T_stmList T_StmList (T_stm head, T_stmList tail);
Typedef enum {T_plus, T_minus, T_mul, T_div, T_and, T_or, T_lshift, T_rshift, T_arshift, T_xor} T_binOp;Typedef enum {T_eq, T_ne, T_lt, T_div, T_gt, T_le, T_ge, T_ult, T_ule, T_ugt , T_uge} T_relOp;
So far we have dealt with expressions that computes a value, we must expend it to expressions that do not compute values, namely : void functions (or procedures), while instructions and Boolean conditions thatmay jump to true or false labels.
Intermediate code (cont)
//translation:
typedef struct Tr_exp_ *Tr_exp;struct Cx {patchList trues , falses ; T_stm stm;};struct Tr_exp_ {enum {Tr_ex , Tr_nx, Tr_cx} kindunion {T_exp ex; T_stm nx ; struct Cx cx;}u;};{constructors…}
//patch list:typedef struct patchList_ *patchList;struct patchList_ {Label *head ; patchList tail};patchList PatchList(Label *head , patchList tail);
Tr_ex – stands for expressions Tr_nx – stands for “no result”Tr_cx – stadns for conditions, the statement may jump to one of the true or false labels in the two given lists.
What do we get so far?Position = initial + rate * 60;
Lexical analysis
Id1, EQ, id2, PL, id3, Mul, number(60), endl
Syntax analysis
assignStm
opExp
opExpid1
id2 +
id3 * numExp
60
Semantic analysis
Intermediate representation has been generated, we are ready for the
synthesis phase, now we can generate the machine code.
What do we get so far? (cont)
Intermediate code generator
temp 1 = number(60)
temp2 = id3 * temp1
temp 3 = id2 + temp2
id1 = temp3
Synthesis
Now that we have the intermediate representation, we
can generate the machine code.
In fact , the intermediate code is a code for an abstract
machine, so all we need to take care of in the Synthesis
phase is:
1) Instruction selection – finding the appropriate machine instructions to implement a given intermediate tree.
2) Registers allocation – allocations of variables to machine registers.
Instruction selection
Unlike other phases when we performed a left to right
scan over the tree, this time the scan will be DFS.
Our intention is to find tree patterns.
NAEMEffectTree pattern
-r[i]TEMP
ADDr[i] <- r[j] + r[k]
MULr[i] <- r[j] * r[k]
DIVr[i] <- r[j] / r[k]
ADDCr[i] <- r[j] / c
LOADr[i] <- M[r[j]+c]
STOREM[r[j]+c] <- r[i]
+
*
/
+
const
+
const
const
MEM
const
+
MEM
const
+
MEM
const
MEM
MEM
const
+
MOVE
MEM
const
+
MOVE
Maximum munchIn order to generate a target code with the minimal number of machine
instructions, we need to find the optimal tiling for a pattern.
An optimal tiling is one where no adjacent tiles can be combined into a
single tile.
There’s an algorithm that finds the optimal tiling in a tree, and it’s the
maximum munch also known as largest match.
Maximum munch has the concept of a greedy algorithm, once it found the
largest match, it doesn’t need any improvements afterward.
Maximum munch (cont)void maximuMunchStm (T_stm s){switch(prog->kind) {case T_SEQ: {maximuMunchStm (s->u.left);maximuMunchStm (s->u.right);}
case T_MOVE{T_exp dst = s->u.MOVE.dst , src = s->u.MOVE.src; if (dst ->kind==T_MEM)
if(dst->u.MEM->kind ==T_BINOP && dst->u.BINOP.op==T_plus && dst->u.MEM->u.BINOP.right->kind == T_CONST{
T_exp e1 = dst->u.MEM->u.BINOP.left, e2=src;munchExp(e1); MunchExp(e2); emit(“STORE”);
else if(dst->u.MEM->kind ==T_BINOP && dst->u.BINOP.op==T_plus && dst->u.MEM->u.BINOP.left->kind == T_CONST{
T_exp e1 = dst->u.MEM->u.BINOP.right, e2=src;munchExp(e1); MunchExp(e2); emit(“STORE”);
else if (src->kind==T_MEM ){T_exp e1 = dst->u.MEM, e2=src->u.MEM;munchExp(e1) ; munchExp(e2) emit(“MOVEM”);
else{T_exp e1 = dst->u.MEM, e2=src;munchExp(e1) ; munchEcp(e2); emit(“STORE”);else if (dst->kind == T_TEMP) {T_exp e2 = src; munchExp(e2); emit(“ADD”);}
MunchExp(T_exp exp){switch (exp->kind){case T_ESEQ:{maximummunchStm(Exp->u.ESEQ->stm); MunchExp(Exp->u.ESEQ->exp);}case T_MEM: {T_exp e =exp->u.MEM ;If(e.kind ==T_BINOP && e->u.BINOP.op = T_plus && e->u.BINOP.right->kind ==T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“LOAD”);}else If(e.kind ==T_BINOP && e->u.BINOP.op = T_plus && e->u.BINOP.left->kind ==T_CONST){MunchExp(e->u.BINOP.right); emit (itoa(e->u.BINOP.left->u.CONST)); emit (“LOAD”);}else if (e.kind ==T_CONST){emit (itoa(e->u.BINOP.right->u.CONST)); emit (“LOAD”);}else {emit (“LOAD”);}case T_BINOP:{T_exp e = exp->u.BINOP;If(e->kind == T_plus && e->right->kind == T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“ADD1”);}else if(e->kind == T_plus && e->left->kind == T_CONST){MunchExp(e->u.BINOP.right); emit (itoa(e->u.BINOP.left->u.CONST)); emit (“ADD1”);}
//cont…
else If(e->kind == T_minus && e->right->kind == T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“SUB1”);}
else if (b->kind = T_plus){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“ADD”);}
else if (b->kind = T_minus){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“SUB”);}
else if (b->kind = T_mul){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“MUL”);}
else if (b->kind = T_div){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“DIV”);}
case T_CONST:{emit (“ADD1”);}}
Register allocation
All the phases we discussed assumes that there are an
Unlimited amount of registers.
We know that this number is limited and hence we need
a method to deal with it.
Two temporaries can fit into the same register if the are
not “in use” at the same time, so the compiler needs to
analyze the intermediate program to determine which
temporaries are in use at the same time.
This phase called, liveness analysis.
Control flow graphIn order to solve the problem, we’ll use control flow graph
the nodes in the graph stands for the statements.
If the statement X can be followed by statement y the edge
(x,y) exists on the graph.
a <- 0L1 : b <- a+1c <- c+bA <- b*2If a < N goto L1return c
b=a+1
2
a=0
1
c=c+b
3
a=b*2
4
a<N
5
return c
5
Liveness analysis
- A flew graph has out-edges that leads to the successors (succ).
- A flew graph has out-edges that leads to the predecessors (pred).
- An assignment to a variable or temporary defines it (def).
- An occurrence of a variable or temporary on the right side of the
assignment uses it (use).
terminology
Liveness of a variable
A variable lives on an edge if there’s a path to a use of the
variable through that edge that doesn’t go through any of it’s def
A variable is live-in a node if it lives on any one of it’s in edges.
A variable live-out a node if it lives on any one of it’s out edges.
definition
equations
][
][][
])[][(][][
nsuccs
sinnout
ndefnoutnusenin
Liveness of a variable (cont)algorithm
][][][][
}
}
][][
])[][(][][
][][
][][
){(
{
}
{];
{};
){(
''
][
'
'
noutnoutninnin
until
sinnout
ndefnoutnusenin
noutnout
ninnin
nforeach
repeat
out
in
nforeach
nsuccs
Liveness of a variable (cont)
Run time complexity
For a control flow graph with N nodes:
The first foreach provides N iterations.
Then there’s a nested loops, in which the inner loop is a
foreach and each iteration of the foreach loop has a
union operation in it, worst case for this union is if the
control flow graph is whole, and the union will provide N
iterations maximum, so with the complexity of the inner
foreach loop is
Each iteration of the repeat iteration deals with a single
edge that can be either in or out edge, so the
complexity of the repeat loop is
)( 2NO
)( 2NO
Liveness of a variable (cont)
Run time complexity (cont)
The worst case complexity of the algorithm is
However in reality, it runs in time between and
)( 4NO
)( 2NO)(NO