Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

68
Chpt 2 Language & Syntax Descript ion From: Chapter 2, Book by Qin

Transcript of Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Page 1: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Chpt 2 Language & Syntax Description

From: Chapter 2, Book by Qin

Page 2: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

• What is a source program?

• What is a language?

• Which kind of program is a correct program?

Input of a Compiler

Page 3: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

1 、 Alphabet Non-empty set of symbols , usually expressed in 、 V or Other Upper-case Greece Letter E.g, English alphabet {a, b, c,……,z}

2 、 Symbol(Character) Elements in alphabet, finest elements in a language, E.g, Just like characters in english

2.1 Alphabet & String

Page 4: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

3 、 String Finite sequence of symbols in the Alphabet. Notes : Null-string is string without any symbol, written as

4 、 Sentence A set of strings based on symbols in the Alphabet in certain construction rules (Syntex)

Page 5: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

5 、 Language

Sets of sentences in the Alphabet.

Page 6: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes : By convention, a symbol is expressed as a, b, c,… ; a string is expressed as , , ,… ; a set of strings is expressed in A, B, C,….

Page 7: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

6 、 Operations on the sets of strings 1) 、 Concatenate (Product) OperationLet string sets A={1,2,…} and B={1,2,...},then AB={|A and B}

Notes : 1 ) String set product on self is called as power of the string set 2 ) A0={} 3 ) n powers of Alphabet A is the set of all strings with n length

Page 8: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2) 、 Closure and positive closure a ) Closure A*=A0A1A2… meaning all strings on Alphabet A(Including )

b ) Positive closure A+=A1A2…=A*-{}

Notes : A language is a subset of positive closure on the Alphabet.

Page 9: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

1、 Basic concepts

1)Grammar is the formal production rules describing the construction of syntax elements.

–Example 2.1: “lists of digits separated by plus or minus silists of digits separated by plus or minus signs gns e.g. 9-5+2, 3-1, and 7 etce.g. 9-5+2, 3-1, and 7 etc..”

list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

2.2 Grammar & Language

Page 10: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(1)Syntax elements include sentences and words in sentences, a language is composed of sentences.

(2) The form of a production rule is as following: left-sideright-side (that can be read as

“left-side is defined as right-side”, “left-side derives right-side”,or “left-side produces right-side”, it expresses the relation between the two sides)

Page 11: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2)Non-terminal symbolA symbol that appears in the left of a rule , is usually bracketed in <> and expresses a syntax concept.A set of non-terminal symbols is expressed in VN

3)Terminal symbolStrings in a language that cannot be decomposed (including strings of single characters), expressed in VT. Notes : Terminal symbols are basic elements of a sentence.

Page 12: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example 2.2: English sentences <sentence> <Subject><Predicate> <Subject> <adjective><noun> <Predicate> <verb><object> <object> <adjective><noun> <adjective> young |pop <noun> men |music <verb> like

Non-terminal symbol: {<sentence>, <Subject>, <Predicate>, <verb>, <object>, <adjective>, <noun>}

Terminal symbol : {young, pop, men, music, like}

Page 13: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

4)Start symbol: A special non-terminal symbol that is the core of the defined syntax. Like <sentence> in example 2.2.

The start symbol is also named as “identified symbol”.

5)Production: A set of rules to define the relations among strings. The form : A ( A produce )

E.g. <Sentence> <Subject><Predicate>

Page 14: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

6)Derivation and reductionDerivation is the process that starts from the Start Symbol, and derives a sentence by replacing the left-side with right side in a production rule.

Reduction is the inverse process of derivation, that is, starting from a given sentence of a language, arriving at the Start Symbol by replacing the right-side with left-side of the production rules finally.

Page 15: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(Continue )Example 2.2: English sentences

<sentence> <Subject><Predicate> <adjective><noun><Predicate> young<noun><Predicate> young men <Predicate> young men <verb><object> young men like<object> young men like<adjective><noun> young men like pop <noun> young men like pop music

Leftmost Derivation

Rightmost Reduction

Page 16: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Leftmost (Rightmost) Derivation : Only use a production rule every time and replace the leftmost (Rightmost) Terminal Symbol with the right side

Leftmost (Rightmost) Derivation are called canonical derivation.

Leftmost(Rightmost) Reduction is the inverse process of Rightmost(Leftmost) derivation.Leftmost and Rightmost Reduction are called canonical reduction.

Page 17: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

*

7)Sentential form、 Sentence & Language

(1)Sentential formString that is produced from every derivation (including 0 derivation) from the Start Symbol. Written as S , ( VN VT)*

Page 18: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(2)SentenceA sentential form that only include terminal symbol

(3)LanguageThe set of sentences (strings) that are produced from one or more derivation from S. Written as L(G), L(G)={|S , and VT*}

+

Page 19: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

8) Recursive definition of grammar rules

A non-terminal symbol is included in the definition of the non-terminal symbol.

Example 2.1: “lists of digits separated by plus or minus siglists of digits separated by plus or minus signs ns e.g. 9-5+2, 3-1, and 7 etce.g. 9-5+2, 3-1, and 7 etc..”

list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 20: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes : You should be careful when you define a grammar in a recursive method. You must give the exit statement (special case statement) of the recursion. Otherwise you can not get a sentence forever.

Page 21: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

9)Extended notations of grammar rules Use extended BNF(Backus Naur Form) notations

() ——Extract factor E.g. Uax|ay|az Rewritten as Ua(x|y|z)

{} ——Assignment of repeat number E.g. <Identifier><Letter>{<Letter>|<Digit>}5

0

[] ——Optional symbol E.g. <Integer>[+|-]<Digit>{<Digit>}

Page 22: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

10)Meta-language symbol The symbols that are used in describing the relations of grammar symbol, E.g. “” and “|” are called as meta-language symbol

Page 23: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2、 Formal definition

1)Grammar definition A grammar G is defined as a quadruple (VN,VT,P,S)

Page 24: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

What is the formal definition of Example 2.2: English sentences: <Sentence> <Subject><Predicate> <Subject> <adjective><noun> <Predicate> <verb><object> <Subject> <adjective><noun> <adjective> young |pop <noun> men |music <verb> like

Page 25: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2)Catalog of grammars According to the limitation on the production rules in a grammar, Chomsky classifies grammars into 4 sorts, such as ,0-type grammar 、 1-type grammar 、 2-type grammar and 3-type grammar

Page 26: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(1) 0-type grammar (Phrase grammar or grammar without limitation)

To any production in P where V+ and V*, there is at least a non-terminal symbol in .

Page 27: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes :The automation that can recognizes a 0-type language is called as Turing Machine.

Page 28: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

The Turing Machine's control device embodies the program. It uses a set of rules, it's notion of "states", and the content of the tape to determine how to process the input symbols. The state that the machine regards itself as being in when the computation begins is called the "initial state". Each rule is stated in the form: "In state n, if the head is reading symbol x, write symbol y, move left or right one cell on the tape, and change the state to m."

Page 29: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Here are a set of rules for a Turing Machine (TM) with five states,the initial state being 1.

Present State

Present Symbol Write Move New State

1 0 0 Right 2

2 0 0 Right 3

2 1 1 Right 2

3 0 Blank Left 5

3 1 0 Left 4

4 0 1 Right 2

Page 30: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

This series of statements actually accomplishes integer addition. Say the input represents two numbers in unary notation that you want to add. To represent a pair of integers (x, k) in unary notation, you would start with a 0, followed by x 1's separated by another 0 followed by k 1's, you then end with a 0. So if you had the pair (2, 3) you would have a unary notation of 01101110. Now, using 01101110 for input and the rules for the TM from the table above you would get an output of 0111110. Here are the steps:

Page 31: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Number State Movement

01101110 1 Write a Zero, Go to state two

01101110 2 Scan past 1's

01101110 2 End of first string, go to next

01101110 3 Change 1 to 0, go back.

01110110 4 Copy the 1, return to second string.

01110110 2 End of first string, go to next.

01110110 3 Change 1 to 0, go back.

01110010 4 Copy the 1, return to second string.

01111010 2 End of first string, go to next.

01111010 3 Change 1 to 0, go back.

01111010 4 Copy the 1, return to second string.

01111100 3 No second string. Erase last 0.

0111110 5 Halt--no further move possible.

Page 32: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

The Church-Turing hypothesis asserts

Suppose P is some decision problem, There is no Turing machine program that computes PIF AND ONLY IFThere is no C (or any other computer language )  program which solves P

Page 33: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

0-type grammar is a grammar that has least limitation on its productions;

We can get other types of grammar by limiting the form of productions in a 0-type grammar.

Page 34: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(2) 1-type grammar (context-sensitive grammar or length-added grammar)

To any production in P,there is the limitation of ||>=|| except for S . If S , S can not appear in the right side of any production.

Or , any production in P has the form of A (where , V* ,A VN, V+) except for S .

Page 35: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes : The automation that can recognizes a 1-type language is called as Linear Bound (LBA)

In a 1-type grammar, we should consider the context of a non-terminal symbol when we replace the non-terminal symbol. And a non-terminal symbol can not be replaced by except that the Start Symbol can produce

Page 36: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(3) 2-type grammar (Context-free grammar)

Every production in P is of the form A where AVN , V*.

Page 37: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes :The left side of each production should be a non-terminal symbol, the right side of each production may be VN , VT or .

The automation that recognizes a 2-type language is called as Push-Down Automation (PDA)

Page 38: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

PUSH DOWN AUTOMATA

Page 39: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(4)3-type grammar (Regular grammar, right-linear grammar or left-linear grammar)

Every production in P is of the form A B , A , or A B , A , where A , BVN , VT*

Page 40: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes : The productions in 3-type grammar are right-linear productions or else left-linear productions. There cannot be either left-linear productions or right-linear productions. If all the productions in a 3-type grammar are left-linear productions, we call name grammar as left-linear grammar. If all the productions in a 3-type grammar are right-linear productions, we name the grammar as right-linear grammar.

Page 41: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

The automation that recognizes 3-type language is called as finite state automation

Page 42: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Hierarchy Alias Production form

Automation name

0-type Grammar without limitation

, V+ Turing Machine

1-type Context-sensitive grammar

A , A VN

Linear Bound Automation

2-type Context-free grammar

A,

A VN

Pushdown automation

3-type Regular grammar

A B , A , A , BVN , VT

*

Finite automation

Page 43: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

+

A language produced from i-type.Written as L(G): L(G)={| VT* ,and S }

Page 44: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

L(G1)={ai(a|b)|i>=0}

Example : LetG2 = ({S},{a,b},P,S)

Where P includes: (0) S aSb

(1) S ab

L(G2)={anbn|n>=1}

Example : Let G1 = ({S},{a,b},P,S)

Where P includes: (0) S aS

(1) S a

(2) S b

Page 45: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes : Limitations on productions in grammars used by lexical analysis and syntax analysis are as followings,– There is not the production such as P P, for this kind

of production would be useless but for leading to ambiguity

– Any non-terminal symbol P should be accessed , and can derive terminal string.• Start from the Start Symbol S , there exists the

derivation S P• P must be able to derive a terminal string ,

that is P ; VT*.*

+

Page 46: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2.3 Grammar construction and simplification

Page 47: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

• Example : Let L1={a2nbn|n>=1 and a,b VT}

Try to construct the grammar G1 from L1

Let n=1 , L1 =aab

– n=2 , L1 =aaaabb– n=3 , L1 =aaaaaabbb– ……– So we have : S aaSb– S aab

Page 48: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : Let L2={aibjck | i,j,k>=1 and a,b,c VT}

Try to construct the grammar G2 from L2

S aS S aB

B bB B bC

C cC | c

(1) Constructing a grammar from a language

Page 49: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : Let L3={ | (a,b)* and there are as many a’s as b’s in }

Try to construct the grammar G3 from L3

S

S bB , S aA

A bS|b , A aAA

B aS | a | bBB

(0) S S aSbSS bSaS

Page 50: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : Let L4={ | (0,1)* and the number of 1 appeared in is even}

Try to construct the grammar G4 from L4

S

S 0S , S 1A

A 0A , A 1S

Page 51: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(2) Grammar Simplification

a 、 Because a language can be described in different grammars, it is true that should select the grammar which has least productions and is the most suitable to the properties of the language.

Page 52: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

b 、 In a grammar, there may be some redundant productions that are useless to derivation. We should delete these productions.

The production which is of the form PPThe production which can not derive a terminal string foreverThe production whose left-side non-terminal symbol does not appear in the right-side of any production

Page 53: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

c 、 Steps of simplification :– Look for the productions of the form PP, and

delete them ;– If a production can not be used in the derivations

forever, delete it ;– If a production can not derive a terminal string,

delete it;– Arrange the remained productions.

Page 54: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : Simplify the following grammar

(0)S Be (1)S Ec (2)A Ae (3)A e

(4)A A (5)B Ce (6)B Af (7)C Cf

(8)D f

Result:

(0) S Be (1)A Ae (2)A e (3)B Af

Page 55: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(3)Construct a context-free grammar without -production

a 、 A context-free grammar without -production should satisfy the conditions as followings– If there is the production S of the form in P, S sho

uld not appear in right-side of any production, where S is the Start Symbol of the grammar ;

– There are no other -productions in P.

b 、 The algorithm to construct a context-free grammar without -production :

Page 56: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

b 、 The algorithm to construct a context-free grammar without -production : G=(VN,VT,P,S) G’=(V’N,V’T,P’,S’) (1) Find out all non-terminal symbols that can derive after some steps, and put

them into the set V0;

(2)Construct the P’ set of productions of G’ as following steps:

(A)If an symbol in V0 appears in the right-side of a production, change the production into two productions : substitute the symbol in and itself in the production respectively ; put the new productions into P’

( B)Otherwise, put the productions relating to the symbol into P’ except for -production relating to the symbol

( C)If there exists the production of the form S in P, change the production into S’ | S and put them into P’,let S’ be the Start Symbol of G’ , let V’N=VN{S’ } ,

Page 57: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : Let G1=({S},{a,b},P,S),whereP: (0) S (1) S aSbS (2) S bSaS

(1)V0={S}

(2)P’ (1) SabS|aSbS|aSb|ab

(2) SbaS|bSaS|bSa|ba

(0) S’ | S

So : G1’=({S’,S},{a,b},P’,S’),where

P’: (0) S’ | S

(1) S abS|aSbS|aSb|ab

(2) S baS|bSaS|bSa|ba

Page 58: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

2.4 Syntax tree and ambiguity of a grammar

a 、 Definition– A tree used to express the structure of a sentence in

a language

b 、 Function– Present the syntax analysis process visually and

directly– Used to decide the ambiguity of a grammar easily

(1) Syntax tree

Page 59: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

S

a B

a B B

b S

b A

a

b

An example to syntax tree

Page 60: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

c 、 Basic terms in a syntax tree (1) Sub-tree A tree composed of a node (except for leaf) and all its

descendent nodes in a syntax tree (2) Pruning sub-tree Prune all the children of the root of a sub-tree (3) Sentential form Sequences of all leafs appearing in a snap-shot of the

growing syntax tree

Page 61: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(4) Phrase

A string of end-symbol sequence from left to right in a sub-tree is called a phrase relating to the root of the sub-tree.– Simple phrase(Direct phrase) : If a phrase is derived by

1 step from the root of a sub-tree, the phrase is called a simple phrase relating to the root of the sub-tree.

– Phrase in a sentential form : A phrase to a sub-tree relating to the sentential form

Page 62: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

c 、 Basic terms in a syntax tree

(5) Handle

A leftmost simple phrase in a sentential form.

Notes: In the process of leftmost recursion, the core work is seeking for the handle.

Page 63: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

S

a B

a B B

b S

b A

a

b

Handles to a syntax tree

2

43

6

5

1

Page 64: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

(2) Ambiguity of a grammar

a 、 Ambiguity of a sentence

If a sentence in a grammar has two or more related syntax tree, the sentence is ambiguous.

b 、 Ambiguity of a grammarIf a language to a grammar has ambiguous

sentences, the grammar is ambiguous.

Page 65: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Example : G=({E} , {+,*,(,),i} , P , E)where : E E+E | E*E | (E) | i

To the sentence (i* i+ i), there are two leftmost derivations, thus there are two syntax trees to the sentence.

(1) E (E) (E+E) (E*E+E) ( i*E+E) ( i*i+E) ( i* i+ i)

(2) E (E) (E*E) ( i*E) ( i*E+E) ( i*i+E) ( i* i+ i)

Page 66: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

E

( E )

E + E

E * E i

i i

E

( E )

E * E

E + E i

i i

Page 67: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

Notes: (1)Ambiguity would bring uncertainty of syntax analysis

(2)Ambiguity of a grammar is undetermined, that is, there is no such algorithm that can determine a grammar is an ambiguous grammar in finite steps

(3)If you want to prove a grammar is ambiguous, you just give a counterexample

(4)If we can control the ambiguity of a grammar, that is, use additional conditions, the existence of ambiguity is not so bad

Page 68: Chpt 2 Language & Syntax Description From: Chapter 2, Book by Qin.

END