03-60-214 Syntax analysisjlu.myweb.cs.uwindsor.ca/214/214Grammar2019.pdf · 17-01-24 5 How to...
Transcript of 03-60-214 Syntax analysisjlu.myweb.cs.uwindsor.ca/214/214Grammar2019.pdf · 17-01-24 5 How to...
2
Lexical analyzer
Parser
id+id
Expr
assignment
:=id
Total := price + tax ;
T o t a l : = p r i c e + t a x ;
17-01-24 4
FormaldefiniKonoflanguage
• Alanguageisasetofstrings– Englishlanguage{“thebrowndoglikesagoodcar”,……}
{sentence|sentencewriYeninEnglish}– Javalanguage{program|programwriYeninJava}– HTMLlanguage{document|documentwriYeninHTML}
• Howdoyoudefinealanguage?• Itisunlikelythatyoucanenumerateallthesentences,
programs,ordocuments
17-01-24 5
Howtodefinealanguage• HowtodefineEnglish
– Asetofwords,suchasbrown,dog,like– Asetofrules
• Asentenceconsistsofasubject,averb,andanobject;• ThesubjectconsistsofanopKonalarKcle,followedbyanopKonaladjecKve,and
followedbyanoun;• ……
– Moreformally:• Words={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,eats,likes}• Rules:
1) SENTENCEàSUBJECTVERBOBJECT2) SUBJECTàARTICLEADJECTIVENOUN3) OBEJCTàARTICLEADJECTIVENOUN4) ARTICLEàa|the|EMPTY5) ADJECTIVEàbrown|friendly|good|EMPTY6) NOUNàbook|refrigerator|dog|car7) VERBàsings|eats|likes
17-01-24 6
DerivaKonofasentence
• Rules:1) SENTENCEàSUBJECTVERBOBJECT2) SUBJECTàARTICLEADJECTIVENOUN3) OBEJCTàARTICLEADJECTIVENOUN4) ARTICLEàa|the|EMPTY5) ADJECTIVEàbrown|friendly|good|EMPTY6) NOUNàbook|refrigerator|dog|car7) VERBàsings|eats|likes
• DerivaKonofasentence“thebrowndoglikesagoodcar”SENTENCEàSUBJECTVERBOBJECTàARTICLEADJECTIVENOUNVERBOBJECTàthebrowndogVERBOBJECTàthebrowndoglikesARTICLEADJECTIVENOUNàthebrowndoglikesagoodcar
17-01-24 7
Theparsetreeofthesentence
The
VERB SUBJECT OBJECT
SENTENCE
ARTICLE ADJ NOUN ARTICLE ADJ NOUN
brown dog likes a good car
Parse the sentence: “the brown dog likes a good car” The top-down approach
17-01-24 8
TopdownandboYomupparsing
The
VERB SUBJECT OBJECT
SENTENCE
ARTICLE ADJ NOUN ARTICLE ADJ NOUN
brown dog likes a good car
17-01-24 9
Typesofparsers
• Topdown– Repeatedlyrewritethestartsymbol– Findthelem-mostderivaKonoftheinputstring– Easytoimplement
• BoYomup– Startwiththetokensandcombinethemtoforminteriornodesofthe
parsetree– Findaright-mostderivaKonoftheinputstring– Acceptwhenthestartsymbolisreached
• BoYomupismoreprevalent
17-01-24 10
FormaldefiniKonofgrammar
• Agrammarisa4-tupleG=(Σ,N,P,S)– Σisafinitesetofterminalsymbols;– Nisafinitesetofnonterminalsymbols;– PisasetofproducKons;– S(fromN)isthestartsymbol.
• TheEnglishsentenceexample– Σ={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,
eats,likes}– N={SENTENCE,SUBJECT,VERB,NOUN,OBJECT,ADJECTIVE,ARTICLE}– S={SENTENCE}– P={rule1)torule7)}
17-01-24 11
RecursivedefiniKon
• Numberofsentencecanbegenerated:ARTICLE ADJ NOUN VERB ARTICLE ADJ NOUN sentences
3* 4* 4* 3* 3* 4* 4* =6912
• Howcanwedefineaninfinitelanguagewithafinitesetofwordsandfinitesetofrules?
• Usingrecursiverules:– SUBJECT/OBJECTcanhavemorethanoneadjecKves:
1) SUBJECTàARTICLEADJECTIVESNOUN2) OBEJCTàARTICLEADJECTIVESNOUN3) ADJECTIVESàADJECTIVE|ADJECTIVESADJETIVE
– Examplesentence:“thegoodbrowndoglikesagoodfriendlybook”
17-01-24 12
Chomskyhierarchy
• NoamChomskyhierarchyisbasedontheformofproducKonrules• Generalform
α1 α2 α3 …αn à β1 β2 β3 … βm Whereαandβarefromterminalsandnonterminals,orempty.
• Level3:Regulargrammar– Oftheformα à β or α à β1 β2 – n=1,andαisanonterminal.– β iseitheraterminaloraterminalfollowedbyanonterminal– RHScontainsatmostonenon-terminalattherightend.
• Level2:Contextfreegrammar– Oftheformα àβ1β2β3…βm
– α isnonterminal. • Level1:ContextsensiKvegrammar
– n<m.Thenumberofsymbolsonthelhsmustnotexceedthenumberofsymbolsontherhs
• Level0:unrestrictedgrammar
17-01-24 13
ContextsensiKvegrammar
• CalledcontextsensiKvebecauseyoucanconstructthegrammaroftheform – AαB à A β B – AαC à A γ B
• ThesubsKtuKonofα dependingonthesurroundingcontextAandBorAandC.
17-01-24 14
Chomskyhierarchy
• regular⊆Context-free⊆ Context-sensiKve⊆ unrestricted
Chomsky level 3 (Regular
grammar/expression)
Chomsky level 2 (Context free grammar)
Chomsky level 1 (Context sensitive grammar)
Chomsky level 0 (unrestricted)
• Themorepowerfulthegrammar,themorecomplextheprogramrequiredtorecognizethelegalinputs.
17-01-24 15
Grammarexamples
• Regulargrammar– L={w|wconsistsofarbitrarynumberof‘a’sand‘b’s}– Grammar:Sàa|b|aS|bS– Examplesentence:“abb”isalegalsentenceinthislanguage– DerivaKon:
SàaSàabSàabb
17-01-24 16
Contextfreegrammarexample
• Contextfreegrammarexample– LanguageL={anbn}– Grammar:Sàab|aSb– NoKcethedifferencewithregulargrammar.– Examplesentence:“aaabbb”isalegalsentenceinthislanguage– DerivaKon:
SàaSbàaaSbbàaaabbb
17-01-24 17
Characterizedifferenttypesofgrammars
• Regulargrammar– Beingabletocountoneitem
• Contextfreegrammar– beingabletocountpairsofitems– anbn
• ContextsensiKvegrammar– Beingabletocountarbitrarily;– anbncn
17-01-24 18
ImplicaKonsofdifferentgrammarsinapplicaKons
• Regulargrammar– Recognizewords
• Contextfreegrammar– Recognizepairsofparenthesis
((a+b)*c)/2
– Recognizeblocks{ statement1; { statement2; statement3; } }
• ContextsensiKvegrammar
19
Contextfreegrammarsandlanguages
• Manylanguagesarenotregular.Thusweneedtoconsiderlargerclassesoflanguages;
• ContextFreeLanguage(CFL)playedacentralroleinnaturallanguagessince1950’s(Chomsky)andincompilerssince1960’s(Backus);
• ContextFreeGrammar(CFG)isthebasisofBNFsyntax;• CFGisincreasinglyimportantforXMLandDTD(XMLSchema).
20
InformalExampleofCFG
• Palindrome:– Madam,I’madam.– Aman,aplan,acanal,panama!
• ConsiderLpal={w|wisapalindromeonsymbols0and1}• Example:1001,11arepalindromes• Howtorepresentpalindromeofanylength?• Basis:ε,0and1arepalindromes;
1. Pàε2. Pà03. Pà1
• InducKon:ifwispalindrome,soare0w0and1w1.nothingelseisapalindrome.1. Pà0P02. Pà1P1
21
Theinformalexample(cont.)
• CFGisaformalmechanismfordefiniKonssuchastheoneforLpal1. Pàε2. Pà03. Pà14. Pà0P05. Pà1P1
– 0and1areterminals– Pisavariable(ornonterminal,orsyntacKccategory)– Pisalsothestartsymbol.– 1-5areproducKons(orrules)
22
FormaldefiniKonofCFG• ACFGisa4-tupleG=(Σ,N,P,S)where
1. Σisafinitesetofterminals;2. Nisfinitesetofvariables(non-terminals);3. PisafinitesetofproducKonsoftheformAàα, whereAisa
nonterminalandαconsistsofsymbolsfromΣandN;1. AiscalledtheheadoftheproducKon;2. αiscalledthebodyoftheproducKon;
4. Sisadesignatednon-terminalcalledthestartsymbol.
• Example:– Gpal=({0,1},{P},A,P),whereA={Pàε,Pà0,Pà1,Pà0P0,Pà1P1}
• SomeKmeswegroupproducKonswiththesamehead.– e.g.,A={Pàε|0|1|0P0|1P1}.
23
AnotherCFGexample:arithmeKcexpression
• G=({+,*,(,),a,b,0,1},{E,R,S},P,E)1. EàR2. EàE+E3. EàE*E4. Eà(E)
5. RàaS6. RàbS7. Sàε8. SàaS9. SàbS10. Sà0S11. Sà1S
24
TwoleveldescripKons
• Context-freesyntaxofarithmeKcexpressions1. EàR2. EàE+E3. EàE*E4. Eà(E)
• LexicalsyntaxofarithmeKcexpressions1. RàaS2. RàbS3. Sàε4. SàaS5. SàbS6. Sà0S7. Sà1SRà(a|b)(a|b|0|1)*
• Whytwolevels– Wethinkthatway(sentence,word,character).– besides,…(seenextslide)
25
Whyuseregularexpression
• Everyregularsetisacontextfreelanguage.
• SincewecanuseCFGtodescriberegularlanguage,whyusingregularexpressiontodefinelexicalsyntaxofalanguage?WhynotusingCFGtodescribeeverything?– Lexicalrulesaresimpler;– Regularexpressionsareconciseand
easiertounderstand;– Moreefficientlexicalanalyzerscanbe
constructedfromregularexpression;– Itisgoodtomodularizecompilerinto
twoseparatecomponents.
Chomsky level 3 (Regular expression)
Chomsky level 2 (Context free grammar)
Chomsky level 1 (Context sensitive grammar)
Chomsky level 0 (unrestricted)
26
BNFandEBNF• BNFandEBNFarecommonlyacceptedwaystoexpressproducKonsofa
context-freegrammar.• BNF
– IntroducedbyJohnBackus,firstusedtodescribeAlgol60.JohnwonTuringawardin1977.
– "BackusNormalForm”,or"BackusNaurForm".– EBNFstandsforExtendedBNF.
• BNFformat– lhs::=rhs– Quoteterminalsornon-terminals
• <>todelimitnon-terminals,• boldorquotesforterminals,or‘asis’
– verKcalbarforalternaKves– TherearemanydifferentnotaKons,hereisanexample
opt-stats::=stats-list|EMPTY.stats-list::=statement|statement‘;’stats-list.
27
EBNF• AnextensionofBNF,useregular-expression-likeconstructsontheright-
hand-sideoftherules:– write[A]tosaythatAisopKonal– Write{A}orA*todenote0ormorerepeKKonsofA(ie.theKleeneclosure
ofA).
• UsingEBNFhastheadvantageofsimplifyingthegrammarbyreducingtheneedtouserecursiverules.
• BothBNFandEBNFareequivalenttoCFG
• Example:
BNF EBNF
block::=‘{'opt-stats‘}’opt-stats::=stats-list|EMPTYstats-list::=statement|statement‘;’stats-list
block::=‘{‘[stats-list]‘}’stats-list::=statement(‘;’statement)*
• Javamethod_declara/on• method_declaraKon::={modifier}typeidenKfier
"("[parameter_list]")"{"[""]"}(statement_block|";")
28
29
Languages,Grammars,andautomata
LanguageClass Grammar Automaton
3 Regular NFAorDFA
2 Context-Free Push-DownAutomaton
1 Context-SensiKve Linear-BoundedAutomaton
0 Unrestricted(orFree) TuringMachine
Closer to machine
More expressive
30
AnotherarithmeKcexpressionexample
(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|8|9
• the“|”meansOR• Sotherulescanbesimplifiedas:(P1-3)Expàexp+digit|exp-digit|digit(p4)digità0|1|2|3|4|5|6|7|8|9
31
DerivaKon• OnestepderivaKon(⇒relaKon)
– IfAàγ is a production, α A β is a string of terminals and variables, then
αAβ ⇒ αγβ – Example:
9-digit+digit⇒9-digit+2usingruleP4:digit→2
• ZeroormorestepsofderivaKon(⇒*)– Basis:α ⇒* α– InducKon:ifα⇒*β ,β⇒γ,thenα⇒*γ.– Example:
• 9-digit+digit⇒*9-digit+digit• 9-digit+digit⇒*9-5+2• 9-digit+digit⇒9-5+digit⇒9-5+2
• OneormorestepsofderivaKon(⇒+)– Example:9-digit+digit⇒+9-5+2
32
ExampleofderivaKon
• Wecanderivethestring9-5+2asfollows:(P1:exp→exp+digit)exp⇒exp+digit (P2:exp→exp–digit)⇒exp-digit+digit(P3:exp→digit)
⇒digit-digit+digit(P4:digit→9)⇒9-digit+digit(P4:digit→5)⇒9-5+digit(P4:digit→2)⇒9-5+2exp⇒+9-5+2exp⇒*9-5+2
33
LemmostandrightmostderivaKonsLeBmostderivaEon:
exp⇒lmexp+digit ⇒lmexp-digit+digit
⇒lmdigit-digit+digit⇒lm9-digit+digit⇒lm9-5+digit⇒lm9-5+2exp⇒+
lm9-5+2
exp⇒*lm9-5+2
RightmostderivaEon:exp⇒rmexp+digit ⇒rmexp+2
⇒rmexp-digit+2⇒rmexp-5+2⇒rmdigit-5+2⇒rm9-5+2exp⇒+
rm9-5+2
exp⇒*rm9-5+2
(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|8|9
34
Parsetree
• ThisderivaKoncanalsoberepresentedviaaParseTree.
digit
exp-digit
Exp
Exp
9-5+2
+ digit
2
9
5
(p1)expàexp+digit(p2)expàexp-digit(p3)expàdigit(p4)digità0|1|2|3|4|5|6|7|
8|9
35
Treeterminologies
• TreesarecollecKonsofnodes,withaparent-childrelaKonship.– Anodehasatmostoneparent,drawnabovethenode;– Anodehaszeroormorechildren,drawnbelowthenode.
• Thereisonenode,theroot,thathasnoparent.Thisnodeappearsatthetopofthetree
• Nodeswithnochildrenarecalledleaves.• Nodesthatarenotleavesareinteriornodes.
36
FormaldefiniKonofparsetree
• ParsetreeshowsthederivaKonofastringusingagrammar.
• ProperKesofaparsetree:– Therootislabeledbythestartsymbol;– Eachleafislabeledbyaterminalorε; – Eachinteriornodeislabeledbya
nonterminal;– IfAisthenonterminalnodeandX1,…,Xn
arethechildrennodesofA,thenAàX1…XnisaproducKon.
• Yieldofaparsetree– lookattheleavesofaparsetree,fromlem
toright,andconcatenatethem– Example:9-5+2
digit
exp-digit
Exp
Exp
+ digit
2
9
5
37
Thelanguageofagrammar
• IfGisagrammar,thelanguageofthegrammar,denotedasL(G),isthesetofterminalstringsthathavederivaKonsfromthestartsymbol.
• IfalanguageListhelanguageofsomecontextfreegrammar,thenLissaidtobeacontextfreelanguage.
• Example:thesetofpalindromesisacontextfreelanguage.
38
DerivaKonandtheparsetree
• Thefollowingsareequivalent:– A⇒*w;– A⇒*lmw;– A⇒*rmw;– ThereisaparsetreewithrootAandyieldw.
41
Exampleofambiguousgrammar
• Ambiguoussentence:– Fruitflieslikeabanana
• Consideraslightlymodifiedgrammarforexpressionsexpràexpr+expr|expr*expr|digit
• DerivaKonsof9+5*2expr⇒expr+expr⇒ expr+expr*expr⇒ +9+5*2expr⇒ expr*expr⇒ expr+expr*expr⇒ +9+5*2
• TherearedifferentderivaKons
42
Ambiguityofagrammar
• Ambiguousgrammar:producemorethanoneparsetreeexpràexpr+expr|exp*expr|digit
9 +5*2
exprexprexpr
expr
expr
9 +5*2
exprexprexpr
expr
expr
• Problemsofambiguousgrammar– onesentencehasdifferentinterpretaKons
9+(5*2)(9+5)*2
43
SeveralderivaKonscannotdecidewhetherthegrammarisambiguous• Usingtheexprgrammar,3+4hasmany
derivaKons:E⇒ E+E⇒ D+E⇒ 3+E⇒ 3+D⇒ 3+4
• Basedontheexistenceoftwo
derivaKons,wecannotdeducethatthegrammarisambiguous;
• ItisnotthemulKplicityofderivaKonsthatcausesambiguity;
• Itistheexistenceofmorethanoneparsetree.
• Inthisexample,thetwoderivaKonswillproducethesametree
E⇒E+E⇒E+D⇒E+4⇒D+4⇒3+4
E
D
E E
43
D
+
44
DerivaKonandparsetree• Inanunambiguousgrammar,lemmostderivaKonwillbe
unique;andrightmostderivaKonwillbeunique;• Howaboutambiguousgrammar?
E⇒ lmE+E⇒ lmD+E⇒ lm9+E⇒ lm9+E*E⇒ lm9+D*E⇒ lm9+5*E⇒ lm9+5*D⇒ lm9+5*2
E⇒ lmE*E⇒ lmE+E*E⇒ lmD+E*E⇒ lm9+E*E⇒ lm9+D*E⇒ lm9+5*E⇒ lm9+5*D⇒ lm9+5*2
• AstringhastwoparsertreesiffithastwodisKnctlemmost
derivaKons(ortworightmostderivaKons).
45
Removeambiguity• SometheoreKcalresults(badnews)
– IsthereanalgorithmtoremovetheambiguityinCFG?• theanswerisno
– IsthereanalgorithmtotelluswhetheraCFGisambiguous?• Theanswerisalsono.
– ThereareCFLsthathavenothingbutambiguousCFGs.• Thatkindoflanguageiscalledambiguouslanguage;• Ifalanguagehasoneunambiguousgrammar,thenitiscalledunambiguous
language.
• InpracKce,therearewell-knowntechniquestoremoveambiguity• Twocausesoftheambiguityintheexprgrammar
– theprecedenceofoperatorisnotrespected.“*”shouldbegroupedbefore“+”;
– asequenceofidenKcaloperatorcanbegroupedeitherfromlemorfromright.3+4+5canbegroupedeitheras(3+4)+5or3+(4+5).
46
Removeambiguity• Enforcingprecedencebyintroducingseveraldifferentvariables,each
representsthoseexpressionsthatsharealevelofbindingstrength– factor:digitisafactor– term:factor*factor*factor....isaterm– expression:term+term+term...isanexpression
• Sowehaveanewgrammar:EàT|E+TTàF|T*FFàD
• Comparetheoriginalgrammar:EàE+EEàE*EEàD
• TheparsertreeforD+D*Dis:
E
T
E T
F+
T F
D*D
F
D
47
AnotherambiguousgrammarStmtàifexprthenstmt|ifexprthenstmtelsestmt|otherIfE1thenifE2thenS1elseS2
stmt
stmt
expr if then stmt
expr if then else stmt
stmt
expr if then stmt
expr if then stmt
else stmt
48
Removetheambiguity
• Match“else”withclosestpreviousunmatched“then”• Howtoincorporatethisruleintothegrammar?
Stmtàifexprthenstmt|ifexprthenstmtelsestmt|otherstmtàmatched_stmt|unmatched_stmtmatched_stmtàifexprthenmatched_stmtelsematched_stmt|otherunmatched_stmtàifexprthenstmt|ifexprthenmatched_stmtelseunmatched_stmt