03-60-214 Syntax analysisjlu.myweb.cs.uwindsor.ca/214/214Grammar2019.pdf · 17-01-24 5 How to...

49
1 03-60-214 Syntax analysis Jianguo Lu School of Computer Science University of Windsor

Transcript of 03-60-214 Syntax analysisjlu.myweb.cs.uwindsor.ca/214/214Grammar2019.pdf · 17-01-24 5 How to...

1

03-60-214Syntaxanalysis

JianguoLuSchoolofComputerScience

UniversityofWindsor

2

Lexical analyzer

Parser

id+id

Expr

assignment

:=id

Total := price + tax ;

T o t a l : = p r i c e + t a x ;

Grammars

17-01-24 4

FormaldefiniKonoflanguage

•  Alanguageisasetofstrings–  Englishlanguage{“thebrowndoglikesagoodcar”,……}

{sentence|sentencewriYeninEnglish}–  Javalanguage{program|programwriYeninJava}–  HTMLlanguage{document|documentwriYeninHTML}

•  Howdoyoudefinealanguage?•  Itisunlikelythatyoucanenumerateallthesentences,

programs,ordocuments

17-01-24 5

Howtodefinealanguage•  HowtodefineEnglish

–  Asetofwords,suchasbrown,dog,like–  Asetofrules

•  Asentenceconsistsofasubject,averb,andanobject;•  ThesubjectconsistsofanopKonalarKcle,followedbyanopKonaladjecKve,and

followedbyanoun;•  ……

–  Moreformally:•  Words={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,eats,likes}•  Rules:

1)  SENTENCEàSUBJECTVERBOBJECT2)  SUBJECTàARTICLEADJECTIVENOUN3)  OBEJCTàARTICLEADJECTIVENOUN4)  ARTICLEàa|the|EMPTY5)  ADJECTIVEàbrown|friendly|good|EMPTY6)  NOUNàbook|refrigerator|dog|car7)  VERBàsings|eats|likes

17-01-24 6

DerivaKonofasentence

•  Rules:1)  SENTENCEàSUBJECTVERBOBJECT2)  SUBJECTàARTICLEADJECTIVENOUN3)  OBEJCTàARTICLEADJECTIVENOUN4)  ARTICLEàa|the|EMPTY5)  ADJECTIVEàbrown|friendly|good|EMPTY6)  NOUNàbook|refrigerator|dog|car7)  VERBàsings|eats|likes

•  DerivaKonofasentence“thebrowndoglikesagoodcar”SENTENCEàSUBJECTVERBOBJECTàARTICLEADJECTIVENOUNVERBOBJECTàthebrowndogVERBOBJECTàthebrowndoglikesARTICLEADJECTIVENOUNàthebrowndoglikesagoodcar

17-01-24 7

Theparsetreeofthesentence

The

VERB SUBJECT OBJECT

SENTENCE

ARTICLE ADJ NOUN ARTICLE ADJ NOUN

brown dog likes a good car

Parse the sentence: “the brown dog likes a good car” The top-down approach

17-01-24 8

TopdownandboYomupparsing

The

VERB SUBJECT OBJECT

SENTENCE

ARTICLE ADJ NOUN ARTICLE ADJ NOUN

brown dog likes a good car

17-01-24 9

Typesofparsers

•  Topdown–  Repeatedlyrewritethestartsymbol–  Findthelem-mostderivaKonoftheinputstring–  Easytoimplement

•  BoYomup–  Startwiththetokensandcombinethemtoforminteriornodesofthe

parsetree–  Findaright-mostderivaKonoftheinputstring–  Acceptwhenthestartsymbolisreached

•  BoYomupismoreprevalent

17-01-24 10

FormaldefiniKonofgrammar

•  Agrammarisa4-tupleG=(Σ,N,P,S)–  Σisafinitesetofterminalsymbols;–  Nisafinitesetofnonterminalsymbols;–  PisasetofproducKons;–  S(fromN)isthestartsymbol.

•  TheEnglishsentenceexample–  Σ={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,

eats,likes}–  N={SENTENCE,SUBJECT,VERB,NOUN,OBJECT,ADJECTIVE,ARTICLE}–  S={SENTENCE}–  P={rule1)torule7)}

17-01-24 11

RecursivedefiniKon

•  Numberofsentencecanbegenerated:ARTICLE ADJ NOUN VERB ARTICLE ADJ NOUN sentences

3* 4* 4* 3* 3* 4* 4* =6912

•  Howcanwedefineaninfinitelanguagewithafinitesetofwordsandfinitesetofrules?

•  Usingrecursiverules:–  SUBJECT/OBJECTcanhavemorethanoneadjecKves:

1)  SUBJECTàARTICLEADJECTIVESNOUN2)  OBEJCTàARTICLEADJECTIVESNOUN3)  ADJECTIVESàADJECTIVE|ADJECTIVESADJETIVE

–  Examplesentence:“thegoodbrowndoglikesagoodfriendlybook”

17-01-24 12

Chomskyhierarchy

•  NoamChomskyhierarchyisbasedontheformofproducKonrules•  Generalform

α1 α2 α3 …αn à β1 β2 β3 … βm Whereαandβarefromterminalsandnonterminals,orempty.

•  Level3:Regulargrammar–  Oftheformα à β or α à β1 β2 –  n=1,andαisanonterminal.–  β iseitheraterminaloraterminalfollowedbyanonterminal–  RHScontainsatmostonenon-terminalattherightend.

•  Level2:Contextfreegrammar–  Oftheformα àβ1β2β3…βm

–  α isnonterminal. •  Level1:ContextsensiKvegrammar

–  n<m.Thenumberofsymbolsonthelhsmustnotexceedthenumberofsymbolsontherhs

•  Level0:unrestrictedgrammar

17-01-24 13

ContextsensiKvegrammar

•  CalledcontextsensiKvebecauseyoucanconstructthegrammaroftheform –  AαB à A β B –  AαC à A γ B

•  ThesubsKtuKonofα dependingonthesurroundingcontextAandBorAandC.

17-01-24 14

Chomskyhierarchy

•  regular⊆Context-free⊆ Context-sensiKve⊆ unrestricted

Chomsky level 3 (Regular

grammar/expression)

Chomsky level 2 (Context free grammar)

Chomsky level 1 (Context sensitive grammar)

Chomsky level 0 (unrestricted)

•  Themorepowerfulthegrammar,themorecomplextheprogramrequiredtorecognizethelegalinputs.

17-01-24 15

Grammarexamples

•  Regulargrammar–  L={w|wconsistsofarbitrarynumberof‘a’sand‘b’s}–  Grammar:Sàa|b|aS|bS–  Examplesentence:“abb”isalegalsentenceinthislanguage–  DerivaKon:

SàaSàabSàabb

17-01-24 16

Contextfreegrammarexample

•  Contextfreegrammarexample–  LanguageL={anbn}–  Grammar:Sàab|aSb–  NoKcethedifferencewithregulargrammar.–  Examplesentence:“aaabbb”isalegalsentenceinthislanguage–  DerivaKon:

SàaSbàaaSbbàaaabbb

17-01-24 17

Characterizedifferenttypesofgrammars

•  Regulargrammar–  Beingabletocountoneitem

•  Contextfreegrammar–  beingabletocountpairsofitems–  anbn

•  ContextsensiKvegrammar–  Beingabletocountarbitrarily;–  anbncn

17-01-24 18

ImplicaKonsofdifferentgrammarsinapplicaKons

•  Regulargrammar–  Recognizewords

•  Contextfreegrammar–  Recognizepairsofparenthesis

((a+b)*c)/2

–  Recognizeblocks{ statement1; { statement2; statement3; } }

•  ContextsensiKvegrammar

19

Contextfreegrammarsandlanguages

•  Manylanguagesarenotregular.Thusweneedtoconsiderlargerclassesoflanguages;

•  ContextFreeLanguage(CFL)playedacentralroleinnaturallanguagessince1950’s(Chomsky)andincompilerssince1960’s(Backus);

•  ContextFreeGrammar(CFG)isthebasisofBNFsyntax;•  CFGisincreasinglyimportantforXMLandDTD(XMLSchema).

20

InformalExampleofCFG

•  Palindrome:–  Madam,I’madam.–  Aman,aplan,acanal,panama!

•  ConsiderLpal={w|wisapalindromeonsymbols0and1}•  Example:1001,11arepalindromes•  Howtorepresentpalindromeofanylength?•  Basis:ε,0and1arepalindromes;

1.  Pàε2.  Pà03.  Pà1

•  InducKon:ifwispalindrome,soare0w0and1w1.nothingelseisapalindrome.1.  Pà0P02.  Pà1P1

21

Theinformalexample(cont.)

•  CFGisaformalmechanismfordefiniKonssuchastheoneforLpal1.  Pàε2.  Pà03.  Pà14.  Pà0P05.  Pà1P1

–  0and1areterminals–  Pisavariable(ornonterminal,orsyntacKccategory)–  Pisalsothestartsymbol.–  1-5areproducKons(orrules)

22

FormaldefiniKonofCFG•  ACFGisa4-tupleG=(Σ,N,P,S)where

1.  Σisafinitesetofterminals;2.  Nisfinitesetofvariables(non-terminals);3.  PisafinitesetofproducKonsoftheformAàα, whereAisa

nonterminalandαconsistsofsymbolsfromΣandN;1.  AiscalledtheheadoftheproducKon;2.  αiscalledthebodyoftheproducKon;

4.  Sisadesignatednon-terminalcalledthestartsymbol.

•  Example:–  Gpal=({0,1},{P},A,P),whereA={Pàε,Pà0,Pà1,Pà0P0,Pà1P1}

•  SomeKmeswegroupproducKonswiththesamehead.–  e.g.,A={Pàε|0|1|0P0|1P1}.

23

AnotherCFGexample:arithmeKcexpression

•  G=({+,*,(,),a,b,0,1},{E,R,S},P,E)1.  EàR2.  EàE+E3.  EàE*E4.  Eà(E)

5.  RàaS6.  RàbS7.  Sàε8.  SàaS9.  SàbS10. Sà0S11. Sà1S

24

TwoleveldescripKons

•  Context-freesyntaxofarithmeKcexpressions1.  EàR2.  EàE+E3.  EàE*E4.  Eà(E)

•  LexicalsyntaxofarithmeKcexpressions1.  RàaS2.  RàbS3.  Sàε4.  SàaS5.  SàbS6.  Sà0S7.  Sà1SRà(a|b)(a|b|0|1)*

•  Whytwolevels–  Wethinkthatway(sentence,word,character).–  besides,…(seenextslide)

25

Whyuseregularexpression

•  Everyregularsetisacontextfreelanguage.

•  SincewecanuseCFGtodescriberegularlanguage,whyusingregularexpressiontodefinelexicalsyntaxofalanguage?WhynotusingCFGtodescribeeverything?–  Lexicalrulesaresimpler;–  Regularexpressionsareconciseand

easiertounderstand;–  Moreefficientlexicalanalyzerscanbe

constructedfromregularexpression;–  Itisgoodtomodularizecompilerinto

twoseparatecomponents.

Chomsky level 3 (Regular expression)

Chomsky level 2 (Context free grammar)

Chomsky level 1 (Context sensitive grammar)

Chomsky level 0 (unrestricted)

26

BNFandEBNF•  BNFandEBNFarecommonlyacceptedwaystoexpressproducKonsofa

context-freegrammar.•  BNF

–  IntroducedbyJohnBackus,firstusedtodescribeAlgol60.JohnwonTuringawardin1977.

–  "BackusNormalForm”,or"BackusNaurForm".–  EBNFstandsforExtendedBNF.

•  BNFformat–  lhs::=rhs–  Quoteterminalsornon-terminals

•  <>todelimitnon-terminals,•  boldorquotesforterminals,or‘asis’

–  verKcalbarforalternaKves–  TherearemanydifferentnotaKons,hereisanexample

opt-stats::=stats-list|EMPTY.stats-list::=statement|statement‘;’stats-list.

27

EBNF•  AnextensionofBNF,useregular-expression-likeconstructsontheright-

hand-sideoftherules:–  write[A]tosaythatAisopKonal–  Write{A}orA*todenote0ormorerepeKKonsofA(ie.theKleeneclosure

ofA).

•  UsingEBNFhastheadvantageofsimplifyingthegrammarbyreducingtheneedtouserecursiverules.

•  BothBNFandEBNFareequivalenttoCFG

•  Example:

BNF EBNF

block::=‘{'opt-stats‘}’opt-stats::=stats-list|EMPTYstats-list::=statement|statement‘;’stats-list

block::=‘{‘[stats-list]‘}’stats-list::=statement(‘;’statement)*

•  Javamethod_declara/on•  method_declaraKon::={modifier}typeidenKfier

"("[parameter_list]")"{"[""]"}(statement_block|";")

28

29

Languages,Grammars,andautomata

LanguageClass Grammar Automaton

3 Regular NFAorDFA

2 Context-Free Push-DownAutomaton

1 Context-SensiKve Linear-BoundedAutomaton

0 Unrestricted(orFree) TuringMachine

Closer to machine

More expressive

30

AnotherarithmeKcexpressionexample

(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|8|9

•  the“|”meansOR•  Sotherulescanbesimplifiedas:(P1-3)Expàexp+digit|exp-digit|digit(p4)digità0|1|2|3|4|5|6|7|8|9

31

DerivaKon•  OnestepderivaKon(⇒relaKon)

–  IfAàγ is a production, α A β is a string of terminals and variables, then

αAβ ⇒ αγβ –  Example:

9-digit+digit⇒9-digit+2usingruleP4:digit→2

•  ZeroormorestepsofderivaKon(⇒*)–  Basis:α ⇒* α–  InducKon:ifα⇒*β ,β⇒γ,thenα⇒*γ.–  Example:

•  9-digit+digit⇒*9-digit+digit•  9-digit+digit⇒*9-5+2•  9-digit+digit⇒9-5+digit⇒9-5+2

•  OneormorestepsofderivaKon(⇒+)–  Example:9-digit+digit⇒+9-5+2

32

ExampleofderivaKon

•  Wecanderivethestring9-5+2asfollows:(P1:exp→exp+digit)exp⇒exp+digit (P2:exp→exp–digit)⇒exp-digit+digit(P3:exp→digit)

⇒digit-digit+digit(P4:digit→9)⇒9-digit+digit(P4:digit→5)⇒9-5+digit(P4:digit→2)⇒9-5+2exp⇒+9-5+2exp⇒*9-5+2

33

LemmostandrightmostderivaKonsLeBmostderivaEon:

exp⇒lmexp+digit ⇒lmexp-digit+digit

⇒lmdigit-digit+digit⇒lm9-digit+digit⇒lm9-5+digit⇒lm9-5+2exp⇒+

lm9-5+2

exp⇒*lm9-5+2

RightmostderivaEon:exp⇒rmexp+digit ⇒rmexp+2

⇒rmexp-digit+2⇒rmexp-5+2⇒rmdigit-5+2⇒rm9-5+2exp⇒+

rm9-5+2

exp⇒*rm9-5+2

(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|8|9

34

Parsetree

•  ThisderivaKoncanalsoberepresentedviaaParseTree.

digit

exp-digit

Exp

Exp

9-5+2

+ digit

2

9

5

(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|

8|9

35

Treeterminologies

•  TreesarecollecKonsofnodes,withaparent-childrelaKonship.–  Anodehasatmostoneparent,drawnabovethenode;–  Anodehaszeroormorechildren,drawnbelowthenode.

•  Thereisonenode,theroot,thathasnoparent.Thisnodeappearsatthetopofthetree

•  Nodeswithnochildrenarecalledleaves.•  Nodesthatarenotleavesareinteriornodes.

36

FormaldefiniKonofparsetree

•  ParsetreeshowsthederivaKonofastringusingagrammar.

•  ProperKesofaparsetree:–  Therootislabeledbythestartsymbol;–  Eachleafislabeledbyaterminalorε; –  Eachinteriornodeislabeledbya

nonterminal;–  IfAisthenonterminalnodeandX1,…,Xn

arethechildrennodesofA,thenAàX1…XnisaproducKon.

•  Yieldofaparsetree–  lookattheleavesofaparsetree,fromlem

toright,andconcatenatethem–  Example:9-5+2

digit

exp-digit

Exp

Exp

+ digit

2

9

5

37

Thelanguageofagrammar

•  IfGisagrammar,thelanguageofthegrammar,denotedasL(G),isthesetofterminalstringsthathavederivaKonsfromthestartsymbol.

•  IfalanguageListhelanguageofsomecontextfreegrammar,thenLissaidtobeacontextfreelanguage.

•  Example:thesetofpalindromesisacontextfreelanguage.

38

DerivaKonandtheparsetree

•  Thefollowingsareequivalent:–  A⇒*w;–  A⇒*lmw;–  A⇒*rmw;–  ThereisaparsetreewithrootAandyieldw.

39

ApplicaKonsofCFG

•  Parserandparsergenerator;•  Markuplanguages.

40

AmbiguityofGrammar

•  Whatisambiguousgrammar;•  Howtoremoveambiguity;

41

Exampleofambiguousgrammar

•  Ambiguoussentence:–  Fruitflieslikeabanana

•  Consideraslightlymodifiedgrammarforexpressionsexpràexpr+expr|expr*expr|digit

•  DerivaKonsof9+5*2expr⇒expr+expr⇒ expr+expr*expr⇒  +9+5*2expr⇒ expr*expr⇒ expr+expr*expr⇒  +9+5*2

•  TherearedifferentderivaKons

42

Ambiguityofagrammar

•  Ambiguousgrammar:producemorethanoneparsetreeexpràexpr+expr|exp*expr|digit

9  +5*2

exprexprexpr

expr

expr

9  +5*2

exprexprexpr

expr

expr

•  Problemsofambiguousgrammar–  onesentencehasdifferentinterpretaKons

9+(5*2)(9+5)*2

43

SeveralderivaKonscannotdecidewhetherthegrammarisambiguous•  Usingtheexprgrammar,3+4hasmany

derivaKons:E⇒  E+E⇒  D+E⇒  3+E⇒  3+D⇒  3+4

•  Basedontheexistenceoftwo

derivaKons,wecannotdeducethatthegrammarisambiguous;

•  ItisnotthemulKplicityofderivaKonsthatcausesambiguity;

•  Itistheexistenceofmorethanoneparsetree.

•  Inthisexample,thetwoderivaKonswillproducethesametree

E⇒E+E⇒E+D⇒E+4⇒D+4⇒3+4

E

D

E E

43

D

+

44

DerivaKonandparsetree•  Inanunambiguousgrammar,lemmostderivaKonwillbe

unique;andrightmostderivaKonwillbeunique;•  Howaboutambiguousgrammar?

E⇒ lmE+E⇒ lmD+E⇒ lm9+E⇒ lm9+E*E⇒ lm9+D*E⇒ lm9+5*E⇒ lm9+5*D⇒ lm9+5*2

E⇒ lmE*E⇒ lmE+E*E⇒ lmD+E*E⇒ lm9+E*E⇒ lm9+D*E⇒ lm9+5*E⇒ lm9+5*D⇒ lm9+5*2

•  AstringhastwoparsertreesiffithastwodisKnctlemmost

derivaKons(ortworightmostderivaKons).

45

Removeambiguity•  SometheoreKcalresults(badnews)

–  IsthereanalgorithmtoremovetheambiguityinCFG?•  theanswerisno

–  IsthereanalgorithmtotelluswhetheraCFGisambiguous?•  Theanswerisalsono.

–  ThereareCFLsthathavenothingbutambiguousCFGs.•  Thatkindoflanguageiscalledambiguouslanguage;•  Ifalanguagehasoneunambiguousgrammar,thenitiscalledunambiguous

language.

•  InpracKce,therearewell-knowntechniquestoremoveambiguity•  Twocausesoftheambiguityintheexprgrammar

–  theprecedenceofoperatorisnotrespected.“*”shouldbegroupedbefore“+”;

–  asequenceofidenKcaloperatorcanbegroupedeitherfromlemorfromright.3+4+5canbegroupedeitheras(3+4)+5or3+(4+5).

46

Removeambiguity•  Enforcingprecedencebyintroducingseveraldifferentvariables,each

representsthoseexpressionsthatsharealevelofbindingstrength–  factor:digitisafactor–  term:factor*factor*factor....isaterm–  expression:term+term+term...isanexpression

•  Sowehaveanewgrammar:EàT|E+TTàF|T*FFàD

•  Comparetheoriginalgrammar:EàE+EEàE*EEàD

•  TheparsertreeforD+D*Dis:

E

T

E T

F+

T F

D*D

F

D

47

AnotherambiguousgrammarStmtàifexprthenstmt|ifexprthenstmtelsestmt|otherIfE1thenifE2thenS1elseS2

stmt

stmt

expr if then stmt

expr if then else stmt

stmt

expr if then stmt

expr if then stmt

else stmt

48

Removetheambiguity

•  Match“else”withclosestpreviousunmatched“then”•  Howtoincorporatethisruleintothegrammar?

Stmtàifexprthenstmt|ifexprthenstmtelsestmt|otherstmtàmatched_stmt|unmatched_stmtmatched_stmtàifexprthenmatched_stmtelsematched_stmt|otherunmatched_stmtàifexprthenstmt|ifexprthenmatched_stmtelseunmatched_stmt

49

Theparsetreeofif-stmtexample

unmatched_stmt

expr if then

matched_stmt

E2 if then else matched_stmt

S1

matched_stmt

S2

stmt

stmt