Chapter 3

27
06/26/22 1 Programming Languages Language Translation Issues Programming Language Syntax Stages in Translation Formal Translation Models

Transcript of Chapter 3

Page 1: Chapter 3

04/08/23 1

Programming Languages

Language Translation Issues

Programming Language SyntaxStages in TranslationFormal Translation Models

Page 2: Chapter 3

04/08/23 2

Programming Languages

Programming Language Syntax

• Syntax is defined as the arrangement of words as elements in a sentence to show their relationship.

X=Y + Z

2+3x4

X=2.45 + 3.67

Semantics cover Declarations, Operations, Sequence Control & Reference Environment.

Page 3: Chapter 3

04/08/23 3

Programming Languages

General Syntactic CriteriaoMain Purpose of Syntax is to provide a notation for communication between the Programmer & Programming Language Processor.

oDifferent Representation Criteria for a Data Type

Syntax Design should provide the following

Readability

Write ability

Ease of Verifiability

Ease of Translation

Lack Of Ambiguity

Page 4: Chapter 3

04/08/23 4

Programming Languages

• Readability

– Underlying Structure of the algorithm and data representation by the program is apparent from an inspection of the program text.

– Should be Self-Documenting.

– Support Natural Statement Formats, Structured Statements, Liberal use of Keywords & noise words, embedded comments, identifiers etc

– Good language should be supported by good Programming.

– Syntactic differences should reflect underlying semantic differences, so that program constructs that do similar things look similar and program constructs that do radically different things look different.

Page 5: Chapter 3

04/08/23 5

Programming Languages

• Writeability

– Features normally conflict with features that makes it easy to write.

– Implicit Syntactic conversations that allow declarations and operations to be left unspecified make programs short to write but difficult to read.

– Support Natural Statement Formats, Structured Statements, Liberal use of Keywords & noise words, embedded comments, identifiers etc

– When is a Syntax Redundant ?• When is it better and where it degrades the performance?

Page 6: Chapter 3

04/08/23 6

Programming Languages

• Ease of Verification

• Ease of Translation– Regularity of Structure/ Complexity – Whether it should be easy on Translator or

User?

• Lack of Ambiguity– An ambiguous statement allows more than 1 interpretations.

Example- 1) if Boolean expression then statement1 else statement2

2) if Boolean expression then statement1

If Boolean expression1 then if Boolean expression2 then statement1 else statement2

Page 7: Chapter 3

04/08/23 7

Programming Languages

Syntactic Elements of a Language

Character Set

Identifiers

Operator Symbol

Keywords & Reserved Words-Difference between them.

Noise Words – Go to

Comments / Blanks (Spaces)

Delimiters and Brackets – e.g. of Delimiters -begin end etc.

Free & Fixed Field Formats

Expression

Statements (Structured /Simple)

Page 8: Chapter 3

04/08/23 8

Programming Languages

Overall Program-Subprogram Structure

Separate Subprogram Definitions – Each Subprogram definition is treated as a separate syntactic unit. Each Program is complied separately and linked at load time.

Separate Data Definitions –Group together all operations that manipulate a given Data Object. : Classes in C

Nested Subprogram Definitions –Helps in Modular approach. But the concept is disappearing with the advent of Object-Oriented Programming.

Separate Interface Definitions –To pass data between two separately compiled components, additional data is needed. Handled by Program Specific Component.

e.g. In C “.h” forms the Specification component and the source program “.c” files form the Implementation component.

Data description separated from executable statements

Un separated subprogram definitions

Page 9: Chapter 3

04/08/23 9

Programming Languages

Stages in Translation

What is Translation ?

Logically, we may divide translation into two major parts – Analysis of the input source program & synthesis of the executable object program.

Translators are generally grouped according to the number of passes

For Fast Compilation use Single pass else Multiple Passes can be used.

Analysis of a Source Program

Lexical Analysis (Scanning) – Group the sequence of Characters

Create Lexeme and attach a type tag.

Model used is Finite-State Automata.

Time Consuming.Lexeme – Number / Identifier / Delimiter / Operator

Page 10: Chapter 3

04/08/23 10

Programming Languages

Source Program

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Code Generation

Lexical Tokens

Parse Tree

Exe code

Intermediate Code

Optimized Intermediate Code

Optimization

Linking

Symbol Table

Other Table

Source

Program

Recognition

Phases

Structure of a Compiler

Page 11: Chapter 3

04/08/23 11

Programming Languages

Stages in Translation

Syntactic Analysis (Parsing).

Identifies Statements, Declarations, Expressions etc.

Semantic Analysis

It is the central phase of translation

Structure of executable object code begins to take place.

Output is some internal form of the final executable program which is then manipulated by the optimization stage of the translator before executable code is actually generated.

Some common functions of Semantic Analyzer :

Symbol-Table Maintenance

Insertion of Implicit Information

Error Detection

Macro Processing

Page 12: Chapter 3

04/08/23 12

Programming Languages

Stages in Translation

Synthesis of the Object Program

Optimization : Works on the Intermediate code received for Semantic analyzer, code contains string of operators and operands, or a table of operator-operand sequences. From this the code generators may generate the properly formatted output object code.

A=B+C+D

-May generate a code as such

-(a) Temp1= B+C

-(b) Temp2=Temp1 + D

-(c) A=Temp2

1)Load Register with B(from(a))2)Add C to register3)Store register in Temp14)Load Register with Temp1(from(b))5)Add D to register6)Store register is Temp27)Load register with Temp2(from(c))8)Store register in A.

Page 13: Chapter 3

04/08/23 13

Programming Languages

Stages in Translation

Code Generation

From the Optimized Code we must form assembly language statements , machine code or other object program form that is to be the output of the Translation.

Linking & Loading

Pieces of code from Separate Translations of subprograms are coalesced into the final executable program.

Bootstrapping

Often translator for a new language is written in that Language.

Diagnostic Compilers

Especially designed for Rapid Turnaround & Compilation time.

Page 14: Chapter 3

04/08/23 14

Programming Languages

Closing Quiz

Main Purpose of Syntax is to provide a notation for communication between the _________ & Programming Language Processor

Write 1 point to illustrate conflict between ease of write ability & readability.

What are the syntactic elements of a Language.

Lexical Analysis produces ______________.

Write the various stages in Translation.

Logically, we may divide translation into two major parts – _________ of the input source program & synthesis of the __________ object program.

Page 15: Chapter 3

04/08/23 15

Programming Languages

Formal Translation Models

The Syntactic Recognition parts of a Compiler theory are generally based on the context-free theory of Languages.

The formal definition of a syntax of a Programming Language is usually called Grammar.

A Grammar consists of a set of rules that specify the sequences of characters that form allowable programs in the langue being defined.

A Formal Grammar is just a grammar specified using a strictly defined notation.

The two classes of grammars useful in Compiler are

BNF Grammar

Regular Grammar

Page 16: Chapter 3

04/08/23 16

Programming Languages

Formal Translation Models

BNF Grammars ( Backus- Naur Form)

Comparison with English

The girl / ran / home

It is a Context free Grammar.

A Syntactically correct program has to make sense Syntactically

A Language is any set of (Finite Length) character strings with characters chosen from some fixed set of symbols.

<digit> :: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Term Digit is called a Syntactic Category or a non terminal.

<conditional statement> :: = if<Boolean expression> then <statement> else <statement> | if <Boolean expression > then <statement>

Page 17: Chapter 3

04/08/23 17

Programming Languages

BNF Grammar (Cont.)

<unsigned integer> ::<digit> | <unsigned integer> <digit>

Examples considering that <identifier> and <number> have already been defined

<assignment statement>:: <variable>=<arithmetic expression>

<arithmetic expression>::=<term> | <arithmetic expression> + <term> |

<arithmetic expression> - <term>

<term>::=<primary> |<term> x <primary> | <term> / <primary>

<primary>::=<variable> | <number> | <arithmetic expression>

<variable>::=<identifier>|<identifier>[subscript list]

<subscript list>::=<arithmetic expression> | <subscript list>,<arithmetic expression>

Page 18: Chapter 3

04/08/23 18

Programming Languages

Parse Trees

We can use a single-replacement rule to generate strings in our language.

S SS | (S) | ( )

S=>(S)=>(SS)=>(( )S)=>(( )( ))

Each term in the derivation is called Sentential Form

The use of a Formal Grammar to define the syntax of a programming language is important both for the Language User & Language Implementer.

To determine if a given string represents a syntactically valid program in the Language, we must use the grammar rules to construct a syntactic analysis or parse of the String. If the String can be successfully parsed, then it is in the language.

Page 19: Chapter 3

04/08/23 19

Programming Languages

BNF Grammar (Cont.)

What are the various restrictions in BNF Grammar

The same identifier may not be declared twice in the same block.

Every identifier must be declared in some block enclosing the point of its use.

An array declared to have two dimensions cannot be referenced with three subscripts.

Ambiguity

They / are /flying planes.

They / are flying /planes.

Page 20: Chapter 3

04/08/23 20

Programming Languages

BNF Grammar (Cont.)

Ambiguity

G1 : S -> SS|0|1 G2 : T -> 0T|1T|0|1

Ambiguous

Page 21: Chapter 3

04/08/23 21

Programming Languages

Extension of BNF Notation

The primary reason of the need for Extension of BNF is that it forces a rather Unnatural Representation for the common syntactic constructs of optional elements, alternative elements & repeated elements within a grammar rule.

Example : A signed integer is a sequence of digits preceded by an optional plus or minus.

<signed integer> :: + <integer> | - <integer>

<integer> :: <digit> | <integer><digit>

In Extended BNF it would be written as such :

signed integers : <signed integer>::[+|-] <digit>{digit}*

Identifier: <identifier> :: = <letter> { <letter> | <digit>}*

Page 22: Chapter 3

04/08/23 22

Programming Languages

Extension of BNF Notation

Syntax Charts (also called a Railroad Diagram)

Variable =Arithmetic expression

Assignment Statement

TermArithmetic Expression

-

+

Page 23: Chapter 3

04/08/23 23

Programming Languages

Finite State Automata

Tokens for a Programming Language have simple structures.

An Identifier begins with a letter, successive characters are letters of digits, they become part of identifiers name.

“if” reserved word is just the letter I followed by letter f.

This simple model is called Finite State Automata or a State Machine

A B

Any string that takes the machine from the initial state to a final state through a series of transitions is accepted by the machine.

Page 24: Chapter 3

04/08/23 24

Input Current State Accept String

Null A No

1 B Yes

10 B Yes

100 B Yes

1001 A No

10010 A No

100101 B Yes

Programming Languages

Page 25: Chapter 3

04/08/23 25

Programming Languages

Deterministic & Non-Deterministic Finite Automata

Deterministic-For each state of FSA and each input symbol, we have a unique transition to the same or different state. If there are “n” states and “k” symbols , then the FSA will have n x k transitions.

Non-Deterministic-It is FSA withA set of States.A start StateA set of Final StatesAn input alphabetA set of arcs from nodes to nodes, each labeled by an element of the Input Alphabet.

Page 26: Chapter 3

04/08/23 26

Programming Languages

Computational Power of an FSA

They have a defined set of states.Anbn will not be recognized by and FSA.

We need to have a Finite set of information like n<=k to find the solution.

Page 27: Chapter 3

04/08/23 27

Programming Languages

Closing Quiz

<term>::=<______> |<term> x <_______> | <term> / <________>

Ambiguity occurs when ______________________.

Syntax charts are also called ________________.

What is the meaning of this symbol in terms of FSA

FSA stands for ______________.

Difference between Non-deterministic FSA & Deterministic FSA