yacc seminar report.doc
-
Upload
manojkumar-putti -
Category
Documents
-
view
225 -
download
0
Transcript of yacc seminar report.doc
-
7/23/2019 yacc seminar report.doc
1/26
-
7/23/2019 yacc seminar report.doc
2/26
Yacc
A
Seminar Report
On
YACC
In partial fulfilment of requirements for the degree of
Bachelor of Technology In
Computer Science& Engineering
BY
PUTTI MANO !UMAR
"#$%#A&'($)
%epartment of
Computer S*ien*e + ,ngineering
,luru College of ,ngineering + Te*hnolog-
%uggirala.,luru.A/P/
0'10/
pg. 2
-
7/23/2019 yacc seminar report.doc
3/26
Yacc
C,RTI3ICAT,
Certified that seminar work entitled YACC CO!I"E#is a $onafide work
carried out in the third semester $y %PUTTI MANO!UMAR
"#$%#A&'($)4 in partial fulfilment for the award of Bachelor of
Technology in Computer Science and Engineering from Eluru College of
Engineering& Technology during the academic year '()*'(+,
Signature of In1Charge 5O%
pg. 3
-
7/23/2019 yacc seminar report.doc
4/26
Yacc
ABSTRACT
Computer program input generall- has some stru*ture6 in fa*t. e7er- *omputer
program that does input *an 8e thought of as defining an 99input language:: ;hi*h it
a**epts/ An input language ma- 8e as *omple< as a programming language. or as
simple as a sequen*e of num8ers/ Unfortunatel-. usual input fa*ilities are limited.
diffi*ult to use. and often are la< a8out *he*=ing their inputs for 7alidit-/
Ya** pro7ides a general tool for des*ri8ing the input to a *omputer program/ The
Ya** user spe*ifies the stru*tures of his input. together ;ith *ode to 8e in7o=ed as
ea*h su*h stru*ture is re*ogni>ed/ Ya** turns su*h a spe*ifi*ation into a su8routine
that handles the input pro*ess6 fre quentl-. it is *on7enient and appropriate to ha7e
most of the flo; of *ontrol in the user:s appli*ation handled 8- this su8routine/
The input su8routine produ*ed 8- Ya** *alls a user1supplied routine to return the
ne
-
7/23/2019 yacc seminar report.doc
5/26
Yacc
AC!NO@,%,M,NT
The satisfaction that accompanies the successful completion of this pro-ect
would $e in complete without the mention of the people who made it
possi$le. without whose constant guidance and encouragement would ha/e
made efforts go in /ain, I consider myself pri/ileged to e0press gratitude and
respect towards all those who guided us through the completion of this
pro-ect,
I con/ey thanks to my pro-ect guide Prof/N/RAMA !IS5NA MT,C5
Computer Science and Engineering 1epartment for pro/iding
encouragement. constant support and guidance which was of a great help to
complete this pro-ect successfully,
I would also like to e0press my gratitude to %r/ P/BA@A!RIS5NA
PRASA%. !rincipal. Eluru college of Engineering and Technology,
pg. 5
-
7/23/2019 yacc seminar report.doc
6/26
-
7/23/2019 yacc seminar report.doc
7/26
Yacc
Introdu*tion to -a** and 8ison
-a**is a parser generator, It is to parsers what le
-
7/23/2019 yacc seminar report.doc
8/26
Yacc
parser< By default. the parser reads from stdinand writes to stdout. -ust like a
lex*generated scanner does,
% yacc myFile.y
creates y.tab.cof C code for parser
% gcc -c y.tab.c
compiles parser code
% gcc o parse y.tab.o lex.yy.o ll -ly
link parser. scanner. li$raries
% parse
in/okes parser. reads from stdin
The akefiles we pro/ide for the pro-ects will e0ecute the a$o/e compilation
steps for you. $ut it is worthwhile to understand the steps re=uired,
yaccFile Format
Your input file is organi9ed as follows 7note the intentional similarities to lex8>
%{
Declarations
%}
Definitions
%%
Productions
%%
User subroutines
pg. 8
-
7/23/2019 yacc seminar report.doc
9/26
Yacc
The optional 1eclarations and ?ser su$routines sections are used for ordinary C
code that you want copied /er$atim to the generated C file. declarations are copied
to the top of the file. user su$routines to the $ottom, The optional 1efinitions
section is where you configure /arious parser features such as defining token
codes. esta$lishing operator precedence and associati/ity. and setting up the glo$al
/aria$les used to communicate $etween the scanner and parser, The re=uired
!roductions section is where you specify the grammar rules, As in lex. you can
associate an action with each pattern 7this time a production8. which allows you to
do whate/er processing is needed as you reduce using that production,
,
-
7/23/2019 yacc seminar report.doc
10/26
Yacc
+ , + /n { printf01 %d/n02 &op$$ }
3
, 4 { Pus!Pop$ 4 Pop$$ }
3 - { int op5 1 Pop$ Pus!Pop$ - op5$ }
3 6 { Pus!Pop$ 6 Pop$$ }
3 7 { int op5 1 Pop$ Pus!Pop$ 7 op5$ }
3 &)*nt { Pus!yyl'al$ }
%% static int
stac(89::;2 count 1 :
static int Pop$ {
assertcount " :$
return stac(8--
count;
}
static int &op$ {
assertcount " :$
return stac(8count-
9;
}
static 'oid Pus!int 'al$ {
assertcount si
-
7/23/2019 yacc seminar report.doc
11/26
Yacc
yyparse$
}
A few things worth pointing out in the a$o/e e0ample>
All token types returned from the scanner must $e defined using %token in the
definitions section, This esta$lishes the numeric codes that will $e used $y the
scanner to tell the parser a$out each token scanned, In addition. the glo$al
/aria$le yylvalis used to store additional attri$ute information a$out the le0eme
itself,
6or each rule. a colon is used in place of the arrow. a /ertical $ar separates the
/arious productions. and a semicolon terminates the rule, ?nlike lex. yaccpays
no attention to line $oundaries in the rules section. so you;re free to use lots of
space to make the grammar look pretty,
ithin the $races for the action associated with a production is -ust ordinary C
code, If no action is present. the parser will take no action upon reducing that
production,
The first rule in the file is assumed to identify the start sym$ol for the grammar,
yyparseis the function generated $y yacc, It reads input from stdin. attempting
to work its way $ack from the input through a series of reductions $ack to the start
sym$ol, The return code from the function is ' if the parse was successful and (
otherwise, If it encounters an error 7i,e, the ne0t token in the input stream cannot
$e shifted8. it calls the routine yyerror2which $y default prints the generic :parse
error: message and =uits,
pg. 11
-
7/23/2019 yacc seminar report.doc
12/26
Yacc
In order to try out our parser. we need to create the scanner for it, 3ere is the lex
file we used>
%{
#include 0y.tab.!0
%}
%%
8:-=;4 { yyl'al 1 atoiyytext$ return &)*nt}
8-467/n; { return yytext8:;}
. { 76 ignore e'eryt!ing else 67 }
i/en the a$o/e specification. yylexwill return the ASCII representation of the
calculator operators. recogni9e integers . and ignore all other characters, hen it
assem$les a series of digits into an integer. it con/erts to a numeric /alue and
stores in yylval7the glo$al reser/ed for passing attri$utes from the scanner to
the parser8, The token type T_Intis returned,
In order to tie this all together. we first run yaccbison on the grammar
specification to generate the y.tab.cand y.tab.h files. and then run lexflex
on the scanner specification to generate the lex.yy.cfile, Compile the two .c
files and link them together. and /oilaD a calculator is $orn< 3ere is the
Makefilewe could use>
calc, lex.yy.o y.tab.o
gcc -o calc lex.yy.o y.tab.o -ly -ll
lex.yy.c, calc.l
y.tab.c flex calc.l
pg. 12
-
7/23/2019 yacc seminar report.doc
13/26
Yacc
y.tab.c, calc.y
bison -'dty calc.y
0) To=ens. Produ*tions. and A*tions
By default. lexand yaccagree to use the ASCII character codes for all single*char
tokens without re=uiring e0plicit definitions, 6or all multi*char tokens. there
needs to $e an agreed*upon num$ering system. and all of these tokens need to $e
specifically defined, The %tokendirecti/e esta$lishes token names and assigns
num$ers,
%to(en &)*nt &)Double &)+tring &)>!ile
The a$o/e line would $e included in the definitions section, It is translated $y
yaccinto
C code as a se=uence of #defines for T_Int. T_Double. and so on. using the
num$ers 25 and higher, These are the codes returned from yylex as each
multicharacter token is scanned and identified, The C declarations for the token
codes are e0ported in the generated y.tab.hheader file, #includethat file in
other modules 7in the scanner. for e0ample8 to stay in sync with the parser*
generated definitions,
!roductions and their accompanying actions are specified using the following
format>
pg. 13
-
7/23/2019 yacc seminar report.doc
14/26
Yacc
left)side , rig!t)side9 {
action9 } 3 rig!t)side5
{ action5 }
3 rig!t)side? { action? }
The left side is the non*terminal $eing descri$ed, on*terminals are named like
typical identifiers> using a se=uence of letters andor digits. starting with a letter,
onterminals do not ha/e to $e defined $efore use. $ut they do need to $e
defined e/entually, The first non*terminal defined in the file is assumed to the
start sym$ol for the grammar,
Each right side is a /alid e0pansion for the left side non*terminal, Fertical $ars
separate the alternati/es. and the entire list is punctuated $y a semicolon, Each
right side is a list of grammar sym$ols. separated $y white space, The sym$ols
can either $e nonterminals or terminals, Terminal sym$ols can either $e
indi/idual character constants. e,g, !a! or token codes defined using %token
such as T_"hile,
The C code enclosed in curly $races after each right side is the associated action,
hene/er the parser recogni9es that production and is ready to reduce. it e0ecutes
the action to process it, hen the entire right side is assem$led on top of the parse
stack and the lookahead indicates a reduce is appropriate. the associated action is
e0ecuted right $efore popping the handle off the stack and following the goto for
the left side non*terminal, The code you include in the actions depends on what
processing you need, The action might $e to $uild a section of the parse tree.pg. 14
-
7/23/2019 yacc seminar report.doc
15/26
Yacc
e/aluate an arithmetic e0pression. declare a /aria$le. or generate code to e0ecute
an assignment statement,
Although it is most common for actions to appear at the end of the right side. it is
also possi$le to place actions in*$etween grammar sym$ols, Those actions will $e
e0ecuted at that point when the sym$ols to the left are on the stack and the sym$ols
to the right are coming up, These em$edded actions can $e a little tricky $ecause
they re=uire the parser to commit to the current production early. more like a
predicti/e parser. and can introduce conflicts if there are still open alternati/es at
that point in the parse,
$) S-m8ol Attri8utes
The parser allows you to associate attri$utes with each grammar sym$ol. $oth
terminals and non*terminals, 6or terminals. the glo$al /aria$le yylvalis used to
communicate the particulars a$out the token -ust scanned from the scanner to the
parser, 6or nonterminals. you can e0plicitly access and set their attri$utes using
the attri$ute stack,
By default. $T& 7the attri$ute type8 is -ust an integer, ?sually you want to
different information for /arious sym$ol types. so a union type can $e used
instead, You indicate what fields you want in the union /ia the %uniondirecti/e,
%union { int
int@alue
double
double@aluepg. 15
-
7/23/2019 yacc seminar report.doc
16/26
Yacc
c!ar
6string@alue
}
The a$o/e line would $e included in the definitions section, It is translated $y
yacc into C code as a $T&typedeffor a new union type with the a$o/e
fields, The glo$al /aria$le yylvalis of this type. and parser stores /aria$les of
this type on the parser stack. one for each sym$ol currently on the stack,
hen defining each token. you can identify which field of the union is applica$le
to this token type $y preceding the token name with the fieldname enclosed in
angle $rackets, The fieldname is optional 7for e0ample. it is not rele/ant for tokens
without attri$utes8,
%to(en int@alue"&)*nt double@alue"&)Double &)>!ile &)Aeturn
To set the attri$ute for a non*terminal. use the %type directi/e. also in the
definitions section, This esta$lishes which field of the union is applica$le to
instances of the nonterminal>
%typeint@alue" *nteger *ntxpression
To access a grammar sym$ol@s attri$ute from the parse stack. there are special
/aria$les a/aila$le for the C code within an action, 'nis the attri$ute for the nth
sym$ol of the current right side. counting from (for the first sym$ol, The attri$ute
for the nonterminal on the left side is denoted $y '', If you ha/e set the type of the
token or nonterminal. then it is clear which field of the attri$utes union you are
accessing, If you ha/e not set the type 7or you want to o/errule the defined field8.pg. 16
-
7/23/2019 yacc seminar report.doc
17/26
Yacc
you can specify with the notation ')fieldna*e+n, A typical use of attri$utes in an
action might $e to gather the attri$utes of the /arious sym$ols on the right side and
use that information to set the attri$ute for the non*terminal on the left side,
A similar mechanism is used to o$tain information a$out sym$ol locations, 6or
each sym$ol on the stack. the parser maintains a /aria$le of type ,T&. which is
a structure containing four mem$ers> first line. first column. last line. and last
column, To o$tain the location of a grammar sym$ol on the right side. you simply
use the notation Bn. completely parallel to 'n, The location of a terminal sym$ol is
furnished $y the le0ical analy9er /ia the glo$al /aria$le yylloc, 1uring a
reduction. the location of the non*terminal on the left side is automatically set
using the com$ined location of all sym$ols in the handle that is $eing reduced,
) Confli*t Resolution
hat happens when you feed yacca grammar that is not "A"#7(8G yaccreports
any conflicts when trying to fill in the ta$le. $ut rather than -ust throwing up its
hands. it has automatic rules for resol/ing the conflicts and $uilding a ta$le
anyway, 6or a shiftreduce conflict. yacc will choose the shift, In a
reducereduce conflict. it will reduce using the rule declared first in the file,
These heuristics can change the language that is accepted and may not $e what
you want, E/en if it happens to work out. it is not recommended to let yaccpick
for you, You should control what happens $y e0plicitly declaring precedence and
associati/ity for your operators,
pg. 17
-
7/23/2019 yacc seminar report.doc
18/26
Yacc
6or e0ample. ask yacc to generate a parser for this am$iguous e0pression
grammar that includes addition. multiplication. and e0ponentiation 7using !-!8,
%to(en &)*nt
%%
, 4
3 6
3 C
3 &)*nt
hen you run yacc on this file. it reports>
conflicts, = s!ift7reduce
In the generated y.output. it tells you more a$out the issue>
...
+tate contains ? s!ift7reduce conflicts.
+tate E contains ? s!ift7reduce conflicts.
+tate contains ? s!ift7reduce
conflicts. ...
If you look through the human*reada$le y.outputfile. you will see it contains the
family of configurating sets and the transitions $etween them, hen you look at
states +. 2. and H. you will see the place we are in the parse and the details of the
conflict, ?nderstanding all that "#7(8 construction stuff -ust might $e useful after
all< #ather than re*writing the grammar to implicitly control the precedence and
associati/ity with a $unch of intermediate non*terminals. we can directly indicate
pg. 18
-
7/23/2019 yacc seminar report.doc
19/26
Yacc
the precedence so that yaccwill know how to $reak ties, In the definitions section.
we can add any num$er of precedence le/els. one per line. from lowest to highest.
and indicate the associati/ity 7either left. right. or nonassoc8, Se/eral terminals can
$e on the same line to assign them e=ual precedence,
%to(en &)*nt
%left 4
%left 6
%rig!t C
%%
, 4
3 6
3 C
3 &)*nt
The a$o/e file says that addition has the lowest precedence and it associates left to
right, ultiplication is higher. and is also left associati/e, E0ponentiation is
highest precedence and it associates right, These directi/es disam$iguate the
conflicts, hen we feed yacc the changed file. it uses the precedence and
associati/ity as specified to $reak ties, 6or a shiftreduce conflict. if the
precedence of the token to $e shifted is higher than that of the rule to reduce. it
chooses the shift and /ice /ersa, The precedence of a rule is determined $y the
precedence of the rightmost terminal on the right*hand side 7or can $e e0plicitly set
with the %precdirecti/e8, So if a /5is on the stack and 0is coming up. the 0
has higher precedence than the /5. so it shifts, If 05is on the stack and /ispg. 19
-
7/23/2019 yacc seminar report.doc
20/26
Yacc
coming up. it reduces, If /5is on the stack and /is coming up. the associati/ity
$reaks the tie. a left*to*right associati/ity would reduce the rule and then go on. a
right*to*left would shift and postpone the reduction,
Another way to set precedence is $y using the %precdirecti/e, hen placed at
the end of a production with a terminal sym$ol as its argument. it e0plicitly sets
the precedence of the production to the same precedence as that terminal, This
can $e used when the right side has no terminals or when you want to o/errule the
precedence gi/en $y the rightmost terminal,
E/en though it doesn;t seem like a precedence pro$lem. the dangling else
am$iguity can $e resol/ed using precedence rules, Think carefully a$out what the
conflict is> Identify what the token is that could $e shifted and the alternate
production that could $e reduced, hat would $e the effect of choosing the shiftG
hat is the effect of choosing to reduceG hich is the one we wantG
?sing yacc@s precedence rules. you can force the choice you want $y setting the
precedence of the token $eing shifted /ersus the precedence of the rule $eing
reduced, hiche/er precedence is higher wins out, The precedence of the token
is set using the ordinary %left. %ri1ht. or %nonassoc directi/es, The
precedence of the rule $eing reduced is determined $y the precedence of the
rightmost terminal 7set the same way8 or /ia an e0plicit %precdirecti/e on the
production,
pg. 20
-
7/23/2019 yacc seminar report.doc
21/26
Yacc
') ,rror handling
hen a yacc*generated parser encounters an error 7i,e, the ne0t input token
cannot $e shifted gi/en the se=uence of things so far on the stack8. it calls the
default yyerror routine to print a generic :parse error: message and halt
parsing, 3owe/er. =uitting in response to the /ery first error is not particularly
helpful
@ar , Godifiers &ype *dentifierHist
3 error
pg. 21
-
7/23/2019 yacc seminar report.doc
22/26
Yacc
The second production allows for handling errors encountered when trying to
recogni9ear$y accepting the alternate se=uence errorfollowed $y a semicolon,
hat happens if the parser in the in the middle of processing the Identifier,ist
when it encounters an errorG The error reco/ery rule. interpreted strictly. applies
to the precise se=uence of an error and a semicolon, If an error occurs in the
middle of an Identifier,ist. there areModifiersand a Typeand what not on
the stack. which doesn@t seem to fit the pattern, 3owe/er. yacc can force the
situation to fit the rule. $y discarding the partially processed rules 7i,e, popping
states from the stack8 until it gets $ack to a state in which the error token is
accepta$le 7i,e, all the way $ack to the state at which it started $efore matching the
Modifiers8, At this point the errortoken can $e shifted, Then. if the lookahead
token couldn;t possi$ly $e shifted. the parser reads tokens. discarding them until it
finds a token that is accepta$le, In this e0ample. yacc reads and discards input
until the ne0t semi*colon so that the error rule can apply, It then reduces that to
ar. and continues on from there, This $asically allowed for a mangled /aria$le
declaration to $e ignored up to the terminating semicolon. which was a reasona$le
attempt at getting the parser $ack on track, ote that if a specific token follows the
error sym$ol it will discard until it finds that token. otherwise it will discard until
it finds any token in the lookahead set for the nonterminal 7for e0ample. anything
that can followarin the e0ample a$o/e8,
pg. 22
-
7/23/2019 yacc seminar report.doc
23/26
Yacc
In our postfi0 calculator from the $eginning of the handout. we can add an error
production to reco/er from a malformed e0pression $y discarding the rest of the
line up to the newline and allow the ne0t line to $e parsed normally>
+ , + /n { printf01 %d/n02 &op$$ }
3 error /n { printf0rrorI Discarding entire line./n0$ }
3
"ike any other yacc rule. one that contains an error can ha/e an associated
action, It would $e typical at this point to clean up after the error and other
necessary housekeeping so that the parse can resume,
here should one put error productionsG It;s more of an art than a science,
!utting error tokens fairly high up tends to protect you from all sorts of errors
$y ensuring there is always a rule to which the parser can reco/er, On the other
hand. you usually want to discard as little input as possi$le $efore reco/ering.
and error tokens at lower le/el rules can help minimi9e the num$er of partially
matched rules that are discarded during reco/ery,
One reasona$le strategy is to add error tokens fairly high up and use punctuation as
the synchroni9ing tokens, 6or e0ample. in a programming language e0pecting a
list of declarations. it might make sense to allow error as one of the alternati/es for
a list entry, If punctuation separates elements of a list. you can use that punctuation
in error rules to help find synchroni9ation pointsDskipping ahead to the ne0t
parameter or the ne0t statement in a list, Trying to add error actions at too low a
pg. 23
-
7/23/2019 yacc seminar report.doc
24/26
Yacc
le/el 7say in e0pression handling8 tends to $e more difficult $ecause of the lack of
structure that allows you to determine when the e0pression ends and where to pick
$ack up, Sometimes adding error actions into more than one le/el may introduce
conflicts into the grammar $ecause more than one error action is a possi$le
reco/ery,
Another mechanism> deli$erately add incorrect productions into the grammar
specification. allow the parser to help you recogni9e these illegal forms. and then
use the action to handle the error manually, 6or e0ample. let@s say you@re writing a
a/a compiler and you know some poor C programmer is going to forget that you
can@t specify the si9e in a a/a array declaration, You could add an alternate array
declaration that allows for a si9e. $ut knowing it was illegal. the action for this
production reports an error and discards the erroneous si9e token,
ost compilers tend to focus their efforts on trying to reco/er from the most
common error situations 7forgetting a semicolon. mistyping a keyword8. $ut don@t
put a lot of effort into dealing with more wacky inputs, Error reco/ery is largely a
trial and error process, 6or fun. you can e0periment with your fa/orite compilers
to see how good a -o$ they do at recogni9ing errors 7they are e0pected to hit this
one perfectly8. reporting errors clearly 7less than perfect8. and gracefully reco/ering
7far from perfect8,
pg. 24
-
7/23/2019 yacc seminar report.doc
25/26
Yacc
2) Other 3eatures
This handout doesn@t detail all the great features a/aila$le in yacc and $ison. please
refer to the man pages and online references for more details on tokens and types.
conflict resolution. sym$ol attri$utes. em$edded actions. error handling. and so on,
pg. 25
-
7/23/2019 yacc seminar report.doc
26/26
Yacc
Bi8liograph-
, "e/ine. T, ason. and 1, Brown. "e0 and Yacc, Se$astopol. CA> O;#eilly &
Associates. (JJ,
!yster. Compiler 1esign and Construction, ew York. Y> Fan ostrand
#einhold. (JHH,
pg. 26