Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Post on 02-Jan-2016

221 views 0 download

Tags:

Transcript of Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Chpt 2 Language & Syntax Description

From: Chapter 2, Book by Qin

• What is a source program?

• What is a language?

• Which kind of program is a correct program?

Input of a Compiler

1 、 Alphabet Non-empty set of symbols , usually expressed in 、 V or Other Upper-case Greece Letter E.g, English alphabet {a, b, c,……,z}

2 、 Symbol(Character) Elements in alphabet, finest elements in a language, E.g, Just like characters in english

2.1 Alphabet & String

3 、 String Finite sequence of symbols in the Alphabet. Notes : Null-string is string without any symbol, written as

4 、 Sentence A set of strings based on symbols in the Alphabet in certain construction rules (Syntex)

5 、 Language

Sets of sentences in the Alphabet.

Notes : By convention, a symbol is expressed as a, b, c,… ; a string is expressed as , , ,… ; a set of strings is expressed in A, B, C,….

6 、 Operations on the sets of strings 1) 、 Concatenate (Product) OperationLet string sets A={1,2,…} and B={1,2,...},then AB={|A and B}

Notes : 1 ) String set product on self is called as power of the string set 2 ) A0={} 3 ) n powers of Alphabet A is the set of all strings with n length

2) 、 Closure and positive closure a ) Closure A*=A0A1A2… meaning all strings on Alphabet A(Including )

b ) Positive closure A+=A1A2…=A*-{}

Notes : A language is a subset of positive closure on the Alphabet.

1、 Basic concepts

1)Grammar is the formal production rules describing the construction of syntax elements.

–Example 2.1: “lists of digits separated by plus or minus silists of digits separated by plus or minus signs gns e.g. 9-5+2, 3-1, and 7 etce.g. 9-5+2, 3-1, and 7 etc..”

list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

2.2 Grammar & Language

(1)Syntax elements include sentences and words in sentences, a language is composed of sentences.

(2) The form of a production rule is as following: left-sideright-side (that can be read as

“left-side is defined as right-side”, “left-side derives right-side”,or “left-side produces right-side”, it expresses the relation between the two sides)

2)Non-terminal symbolA symbol that appears in the left of a rule , is usually bracketed in <> and expresses a syntax concept.A set of non-terminal symbols is expressed in VN

3)Terminal symbolStrings in a language that cannot be decomposed (including strings of single characters), expressed in VT. Notes : Terminal symbols are basic elements of a sentence.

Example 2.2: English sentences <sentence> <Subject><Predicate> <Subject> <adjective><noun> <Predicate> <verb><object> <object> <adjective><noun> <adjective> young |pop <noun> men |music <verb> like

Non-terminal symbol: {<sentence>, <Subject>, <Predicate>, <verb>, <object>, <adjective>, <noun>}

Terminal symbol : {young, pop, men, music, like}

4)Start symbol: A special non-terminal symbol that is the core of the defined syntax. Like <sentence> in example 2.2.

The start symbol is also named as “identified symbol”.

5)Production: A set of rules to define the relations among strings. The form : A ( A produce )

E.g. <Sentence> <Subject><Predicate>

6)Derivation and reductionDerivation is the process that starts from the Start Symbol, and derives a sentence by replacing the left-side with right side in a production rule.

Reduction is the inverse process of derivation, that is, starting from a given sentence of a language, arriving at the Start Symbol by replacing the right-side with left-side of the production rules finally.

(Continue )Example 2.2: English sentences

<sentence> <Subject><Predicate> <adjective><noun><Predicate> young<noun><Predicate> young men <Predicate> young men <verb><object> young men like<object> young men like<adjective><noun> young men like pop <noun> young men like pop music

Leftmost Derivation

Rightmost Reduction

Leftmost (Rightmost) Derivation : Only use a production rule every time and replace the leftmost (Rightmost) Terminal Symbol with the right side

Leftmost (Rightmost) Derivation are called canonical derivation.

Leftmost(Rightmost) Reduction is the inverse process of Rightmost(Leftmost) derivation.Leftmost and Rightmost Reduction are called canonical reduction.

*

7)Sentential form、 Sentence & Language

(1)Sentential formString that is produced from every derivation (including 0 derivation) from the Start Symbol. Written as S , ( VN VT)*

(2)SentenceA sentential form that only include terminal symbol

(3)LanguageThe set of sentences (strings) that are produced from one or more derivation from S. Written as L(G), L(G)={|S , and VT*}

+

8) Recursive definition of grammar rules

A non-terminal symbol is included in the definition of the non-terminal symbol.

Example 2.1: “lists of digits separated by plus or minus siglists of digits separated by plus or minus signs ns e.g. 9-5+2, 3-1, and 7 etce.g. 9-5+2, 3-1, and 7 etc..”

list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Notes : You should be careful when you define a grammar in a recursive method. You must give the exit statement (special case statement) of the recursion. Otherwise you can not get a sentence forever.

9)Extended notations of grammar rules Use extended BNF(Backus Naur Form) notations

() ——Extract factor E.g. Uax|ay|az Rewritten as Ua(x|y|z)

{} ——Assignment of repeat number E.g. <Identifier><Letter>{<Letter>|<Digit>}5

0

[] ——Optional symbol E.g. <Integer>[+|-]<Digit>{<Digit>}

10)Meta-language symbol The symbols that are used in describing the relations of grammar symbol, E.g. “” and “|” are called as meta-language symbol

2、 Formal definition

1)Grammar definition A grammar G is defined as a quadruple (VN,VT,P,S)

What is the formal definition of Example 2.2: English sentences: <Sentence> <Subject><Predicate> <Subject> <adjective><noun> <Predicate> <verb><object> <Subject> <adjective><noun> <adjective> young |pop <noun> men |music <verb> like

2)Catalog of grammars According to the limitation on the production rules in a grammar, Chomsky classifies grammars into 4 sorts, such as ,0-type grammar 、 1-type grammar 、 2-type grammar and 3-type grammar

(1) 0-type grammar (Phrase grammar or grammar without limitation)

To any production in P where V+ and V*, there is at least a non-terminal symbol in .

Notes :The automation that can recognizes a 0-type language is called as Turing Machine.

The Turing Machine's control device embodies the program. It uses a set of rules, it's notion of "states", and the content of the tape to determine how to process the input symbols. The state that the machine regards itself as being in when the computation begins is called the "initial state". Each rule is stated in the form: "In state n, if the head is reading symbol x, write symbol y, move left or right one cell on the tape, and change the state to m."

Here are a set of rules for a Turing Machine (TM) with five states,the initial state being 1.

Present State

Present Symbol Write Move New State

1 0 0 Right 2

2 0 0 Right 3

2 1 1 Right 2

3 0 Blank Left 5

3 1 0 Left 4

4 0 1 Right 2

This series of statements actually accomplishes integer addition. Say the input represents two numbers in unary notation that you want to add. To represent a pair of integers (x, k) in unary notation, you would start with a 0, followed by x 1's separated by another 0 followed by k 1's, you then end with a 0. So if you had the pair (2, 3) you would have a unary notation of 01101110. Now, using 01101110 for input and the rules for the TM from the table above you would get an output of 0111110. Here are the steps:

Number State Movement

01101110 1 Write a Zero, Go to state two

01101110 2 Scan past 1's

01101110 2 End of first string, go to next

01101110 3 Change 1 to 0, go back.

01110110 4 Copy the 1, return to second string.

01110110 2 End of first string, go to next.

01110110 3 Change 1 to 0, go back.

01110010 4 Copy the 1, return to second string.

01111010 2 End of first string, go to next.

01111010 3 Change 1 to 0, go back.

01111010 4 Copy the 1, return to second string.

01111100 3 No second string. Erase last 0.

0111110 5 Halt--no further move possible.

The Church-Turing hypothesis asserts

Suppose P is some decision problem, There is no Turing machine program that computes PIF AND ONLY IFThere is no C (or any other computer language )  program which solves P

0-type grammar is a grammar that has least limitation on its productions;

We can get other types of grammar by limiting the form of productions in a 0-type grammar.

(2) 1-type grammar (context-sensitive grammar or length-added grammar)

To any production in P,there is the limitation of ||>=|| except for S . If S , S can not appear in the right side of any production.

Or , any production in P has the form of A (where , V* ,A VN, V+) except for S .

Notes : The automation that can recognizes a 1-type language is called as Linear Bound (LBA)

In a 1-type grammar, we should consider the context of a non-terminal symbol when we replace the non-terminal symbol. And a non-terminal symbol can not be replaced by except that the Start Symbol can produce

(3) 2-type grammar (Context-free grammar)

Every production in P is of the form A where AVN , V*.

Notes :The left side of each production should be a non-terminal symbol, the right side of each production may be VN , VT or .

The automation that recognizes a 2-type language is called as Push-Down Automation (PDA)

PUSH DOWN AUTOMATA

(4)3-type grammar (Regular grammar, right-linear grammar or left-linear grammar)

Every production in P is of the form A B , A , or A B , A , where A , BVN , VT*

Notes : The productions in 3-type grammar are right-linear productions or else left-linear productions. There cannot be either left-linear productions or right-linear productions. If all the productions in a 3-type grammar are left-linear productions, we call name grammar as left-linear grammar. If all the productions in a 3-type grammar are right-linear productions, we name the grammar as right-linear grammar.

The automation that recognizes 3-type language is called as finite state automation

Hierarchy Alias Production form

Automation name

0-type Grammar without limitation

, V+ Turing Machine

1-type Context-sensitive grammar

A , A VN

Linear Bound Automation

2-type Context-free grammar

A,

A VN

Pushdown automation

3-type Regular grammar

A B , A , A , BVN , VT

*

Finite automation

+

A language produced from i-type.Written as L(G): L(G)={| VT* ,and S }

L(G1)={ai(a|b)|i>=0}

Example : LetG2 = ({S},{a,b},P,S)

Where P includes: (0) S aSb

(1) S ab

L(G2)={anbn|n>=1}

Example : Let G1 = ({S},{a,b},P,S)

Where P includes: (0) S aS

(1) S a

(2) S b

Notes : Limitations on productions in grammars used by lexical analysis and syntax analysis are as followings,– There is not the production such as P P, for this kind

of production would be useless but for leading to ambiguity

– Any non-terminal symbol P should be accessed , and can derive terminal string.• Start from the Start Symbol S , there exists the

derivation S P• P must be able to derive a terminal string ,

that is P ; VT*.*

+

2.3 Grammar construction and simplification

• Example : Let L1={a2nbn|n>=1 and a,b VT}

Try to construct the grammar G1 from L1

Let n=1 , L1 =aab

– n=2 , L1 =aaaabb– n=3 , L1 =aaaaaabbb– ……– So we have : S aaSb– S aab

Example : Let L2={aibjck | i,j,k>=1 and a,b,c VT}

Try to construct the grammar G2 from L2

S aS S aB

B bB B bC

C cC | c

(1) Constructing a grammar from a language

Example : Let L3={ | (a,b)* and there are as many a’s as b’s in }

Try to construct the grammar G3 from L3

S

S bB , S aA

A bS|b , A aAA

B aS | a | bBB

(0) S S aSbSS bSaS

Example : Let L4={ | (0,1)* and the number of 1 appeared in is even}

Try to construct the grammar G4 from L4

S

S 0S , S 1A

A 0A , A 1S

(2) Grammar Simplification

a 、 Because a language can be described in different grammars, it is true that should select the grammar which has least productions and is the most suitable to the properties of the language.

b 、 In a grammar, there may be some redundant productions that are useless to derivation. We should delete these productions.

The production which is of the form PPThe production which can not derive a terminal string foreverThe production whose left-side non-terminal symbol does not appear in the right-side of any production

c 、 Steps of simplification :– Look for the productions of the form PP, and

delete them ;– If a production can not be used in the derivations

forever, delete it ;– If a production can not derive a terminal string,

delete it;– Arrange the remained productions.

Example : Simplify the following grammar

(0)S Be (1)S Ec (2)A Ae (3)A e

(4)A A (5)B Ce (6)B Af (7)C Cf

(8)D f

Result:

(0) S Be (1)A Ae (2)A e (3)B Af

(3)Construct a context-free grammar without -production

a 、 A context-free grammar without -production should satisfy the conditions as followings– If there is the production S of the form in P, S sho

uld not appear in right-side of any production, where S is the Start Symbol of the grammar ;

– There are no other -productions in P.

b 、 The algorithm to construct a context-free grammar without -production :

b 、 The algorithm to construct a context-free grammar without -production : G=(VN,VT,P,S) G’=(V’N,V’T,P’,S’) (1) Find out all non-terminal symbols that can derive after some steps, and put

them into the set V0;

(2)Construct the P’ set of productions of G’ as following steps:

(A)If an symbol in V0 appears in the right-side of a production, change the production into two productions : substitute the symbol in and itself in the production respectively ; put the new productions into P’

( B)Otherwise, put the productions relating to the symbol into P’ except for -production relating to the symbol

( C)If there exists the production of the form S in P, change the production into S’ | S and put them into P’,let S’ be the Start Symbol of G’ , let V’N=VN{S’ } ,

Example : Let G1=({S},{a,b},P,S),whereP: (0) S (1) S aSbS (2) S bSaS

(1)V0={S}

(2)P’ (1) SabS|aSbS|aSb|ab

(2) SbaS|bSaS|bSa|ba

(0) S’ | S

So : G1’=({S’,S},{a,b},P’,S’),where

P’: (0) S’ | S

(1) S abS|aSbS|aSb|ab

(2) S baS|bSaS|bSa|ba

2.4 Syntax tree and ambiguity of a grammar

a 、 Definition– A tree used to express the structure of a sentence in

a language

b 、 Function– Present the syntax analysis process visually and

directly– Used to decide the ambiguity of a grammar easily

(1) Syntax tree

S

a B

a B B

b S

b A

a

b

An example to syntax tree

c 、 Basic terms in a syntax tree (1) Sub-tree A tree composed of a node (except for leaf) and all its

descendent nodes in a syntax tree (2) Pruning sub-tree Prune all the children of the root of a sub-tree (3) Sentential form Sequences of all leafs appearing in a snap-shot of the

growing syntax tree

(4) Phrase

A string of end-symbol sequence from left to right in a sub-tree is called a phrase relating to the root of the sub-tree.– Simple phrase(Direct phrase) : If a phrase is derived by

1 step from the root of a sub-tree, the phrase is called a simple phrase relating to the root of the sub-tree.

– Phrase in a sentential form : A phrase to a sub-tree relating to the sentential form

c 、 Basic terms in a syntax tree

(5) Handle

A leftmost simple phrase in a sentential form.

Notes: In the process of leftmost recursion, the core work is seeking for the handle.

S

a B

a B B

b S

b A

a

b

Handles to a syntax tree

2

43

6

5

1

(2) Ambiguity of a grammar

a 、 Ambiguity of a sentence

If a sentence in a grammar has two or more related syntax tree, the sentence is ambiguous.

b 、 Ambiguity of a grammarIf a language to a grammar has ambiguous

sentences, the grammar is ambiguous.

Example : G=({E} , {+,*,(,),i} , P , E)where : E E+E | E*E | (E) | i

To the sentence (i* i+ i), there are two leftmost derivations, thus there are two syntax trees to the sentence.

(1) E (E) (E+E) (E*E+E) ( i*E+E) ( i*i+E) ( i* i+ i)

(2) E (E) (E*E) ( i*E) ( i*E+E) ( i*i+E) ( i* i+ i)

E

( E )

E + E

E * E i

i i

E

( E )

E * E

E + E i

i i

Notes: (1)Ambiguity would bring uncertainty of syntax analysis

(2)Ambiguity of a grammar is undetermined, that is, there is no such algorithm that can determine a grammar is an ambiguous grammar in finite steps

(3)If you want to prove a grammar is ambiguous, you just give a counterexample

(4)If we can control the ambiguity of a grammar, that is, use additional conditions, the existence of ambiguity is not so bad

END