yacc seminar report.doc

download yacc seminar report.doc

of 26

Transcript of yacc seminar report.doc

  • 7/23/2019 yacc seminar report.doc

    1/26

  • 7/23/2019 yacc seminar report.doc

    2/26

    Yacc

    A

    Seminar Report

    On

    YACC

    In partial fulfilment of requirements for the degree of

    Bachelor of Technology In

    Computer Science& Engineering

    BY

    PUTTI MANO !UMAR

    "#$%#A&'($)

    %epartment of

    Computer S*ien*e + ,ngineering

    ,luru College of ,ngineering + Te*hnolog-

    %uggirala.,luru.A/P/

    0'10/

    pg. 2

  • 7/23/2019 yacc seminar report.doc

    3/26

    Yacc

    C,RTI3ICAT,

    Certified that seminar work entitled YACC CO!I"E#is a $onafide work

    carried out in the third semester $y %PUTTI MANO!UMAR

    "#$%#A&'($)4 in partial fulfilment for the award of Bachelor of

    Technology in Computer Science and Engineering from Eluru College of

    Engineering& Technology during the academic year '()*'(+,

    Signature of In1Charge 5O%

    pg. 3

  • 7/23/2019 yacc seminar report.doc

    4/26

    Yacc

    ABSTRACT

    Computer program input generall- has some stru*ture6 in fa*t. e7er- *omputer

    program that does input *an 8e thought of as defining an 99input language:: ;hi*h it

    a**epts/ An input language ma- 8e as *omple< as a programming language. or as

    simple as a sequen*e of num8ers/ Unfortunatel-. usual input fa*ilities are limited.

    diffi*ult to use. and often are la< a8out *he*=ing their inputs for 7alidit-/

    Ya** pro7ides a general tool for des*ri8ing the input to a *omputer program/ The

    Ya** user spe*ifies the stru*tures of his input. together ;ith *ode to 8e in7o=ed as

    ea*h su*h stru*ture is re*ogni>ed/ Ya** turns su*h a spe*ifi*ation into a su8routine

    that handles the input pro*ess6 fre quentl-. it is *on7enient and appropriate to ha7e

    most of the flo; of *ontrol in the user:s appli*ation handled 8- this su8routine/

    The input su8routine produ*ed 8- Ya** *alls a user1supplied routine to return the

    ne

  • 7/23/2019 yacc seminar report.doc

    5/26

    Yacc

    AC!NO@,%,M,NT

    The satisfaction that accompanies the successful completion of this pro-ect

    would $e in complete without the mention of the people who made it

    possi$le. without whose constant guidance and encouragement would ha/e

    made efforts go in /ain, I consider myself pri/ileged to e0press gratitude and

    respect towards all those who guided us through the completion of this

    pro-ect,

    I con/ey thanks to my pro-ect guide Prof/N/RAMA !IS5NA MT,C5

    Computer Science and Engineering 1epartment for pro/iding

    encouragement. constant support and guidance which was of a great help to

    complete this pro-ect successfully,

    I would also like to e0press my gratitude to %r/ P/BA@A!RIS5NA

    PRASA%. !rincipal. Eluru college of Engineering and Technology,

    pg. 5

  • 7/23/2019 yacc seminar report.doc

    6/26

  • 7/23/2019 yacc seminar report.doc

    7/26

    Yacc

    Introdu*tion to -a** and 8ison

    -a**is a parser generator, It is to parsers what le

  • 7/23/2019 yacc seminar report.doc

    8/26

    Yacc

    parser< By default. the parser reads from stdinand writes to stdout. -ust like a

    lex*generated scanner does,

    % yacc myFile.y

    creates y.tab.cof C code for parser

    % gcc -c y.tab.c

    compiles parser code

    % gcc o parse y.tab.o lex.yy.o ll -ly

    link parser. scanner. li$raries

    % parse

    in/okes parser. reads from stdin

    The akefiles we pro/ide for the pro-ects will e0ecute the a$o/e compilation

    steps for you. $ut it is worthwhile to understand the steps re=uired,

    yaccFile Format

    Your input file is organi9ed as follows 7note the intentional similarities to lex8>

    %{

    Declarations

    %}

    Definitions

    %%

    Productions

    %%

    User subroutines

    pg. 8

  • 7/23/2019 yacc seminar report.doc

    9/26

    Yacc

    The optional 1eclarations and ?ser su$routines sections are used for ordinary C

    code that you want copied /er$atim to the generated C file. declarations are copied

    to the top of the file. user su$routines to the $ottom, The optional 1efinitions

    section is where you configure /arious parser features such as defining token

    codes. esta$lishing operator precedence and associati/ity. and setting up the glo$al

    /aria$les used to communicate $etween the scanner and parser, The re=uired

    !roductions section is where you specify the grammar rules, As in lex. you can

    associate an action with each pattern 7this time a production8. which allows you to

    do whate/er processing is needed as you reduce using that production,

    ,

  • 7/23/2019 yacc seminar report.doc

    10/26

    Yacc

    + , + /n { printf01 %d/n02 &op$$ }

    3

    , 4 { Pus!Pop$ 4 Pop$$ }

    3 - { int op5 1 Pop$ Pus!Pop$ - op5$ }

    3 6 { Pus!Pop$ 6 Pop$$ }

    3 7 { int op5 1 Pop$ Pus!Pop$ 7 op5$ }

    3 &)*nt { Pus!yyl'al$ }

    %% static int

    stac(89::;2 count 1 :

    static int Pop$ {

    assertcount " :$

    return stac(8--

    count;

    }

    static int &op$ {

    assertcount " :$

    return stac(8count-

    9;

    }

    static 'oid Pus!int 'al$ {

    assertcount si

  • 7/23/2019 yacc seminar report.doc

    11/26

    Yacc

    yyparse$

    }

    A few things worth pointing out in the a$o/e e0ample>

    All token types returned from the scanner must $e defined using %token in the

    definitions section, This esta$lishes the numeric codes that will $e used $y the

    scanner to tell the parser a$out each token scanned, In addition. the glo$al

    /aria$le yylvalis used to store additional attri$ute information a$out the le0eme

    itself,

    6or each rule. a colon is used in place of the arrow. a /ertical $ar separates the

    /arious productions. and a semicolon terminates the rule, ?nlike lex. yaccpays

    no attention to line $oundaries in the rules section. so you;re free to use lots of

    space to make the grammar look pretty,

    ithin the $races for the action associated with a production is -ust ordinary C

    code, If no action is present. the parser will take no action upon reducing that

    production,

    The first rule in the file is assumed to identify the start sym$ol for the grammar,

    yyparseis the function generated $y yacc, It reads input from stdin. attempting

    to work its way $ack from the input through a series of reductions $ack to the start

    sym$ol, The return code from the function is ' if the parse was successful and (

    otherwise, If it encounters an error 7i,e, the ne0t token in the input stream cannot

    $e shifted8. it calls the routine yyerror2which $y default prints the generic :parse

    error: message and =uits,

    pg. 11

  • 7/23/2019 yacc seminar report.doc

    12/26

    Yacc

    In order to try out our parser. we need to create the scanner for it, 3ere is the lex

    file we used>

    %{

    #include 0y.tab.!0

    %}

    %%

    8:-=;4 { yyl'al 1 atoiyytext$ return &)*nt}

    8-467/n; { return yytext8:;}

    . { 76 ignore e'eryt!ing else 67 }

    i/en the a$o/e specification. yylexwill return the ASCII representation of the

    calculator operators. recogni9e integers . and ignore all other characters, hen it

    assem$les a series of digits into an integer. it con/erts to a numeric /alue and

    stores in yylval7the glo$al reser/ed for passing attri$utes from the scanner to

    the parser8, The token type T_Intis returned,

    In order to tie this all together. we first run yaccbison on the grammar

    specification to generate the y.tab.cand y.tab.h files. and then run lexflex

    on the scanner specification to generate the lex.yy.cfile, Compile the two .c

    files and link them together. and /oilaD a calculator is $orn< 3ere is the

    Makefilewe could use>

    calc, lex.yy.o y.tab.o

    gcc -o calc lex.yy.o y.tab.o -ly -ll

    lex.yy.c, calc.l

    y.tab.c flex calc.l

    pg. 12

  • 7/23/2019 yacc seminar report.doc

    13/26

    Yacc

    y.tab.c, calc.y

    bison -'dty calc.y

    0) To=ens. Produ*tions. and A*tions

    By default. lexand yaccagree to use the ASCII character codes for all single*char

    tokens without re=uiring e0plicit definitions, 6or all multi*char tokens. there

    needs to $e an agreed*upon num$ering system. and all of these tokens need to $e

    specifically defined, The %tokendirecti/e esta$lishes token names and assigns

    num$ers,

    %to(en &)*nt &)Double &)+tring &)>!ile

    The a$o/e line would $e included in the definitions section, It is translated $y

    yaccinto

    C code as a se=uence of #defines for T_Int. T_Double. and so on. using the

    num$ers 25 and higher, These are the codes returned from yylex as each

    multicharacter token is scanned and identified, The C declarations for the token

    codes are e0ported in the generated y.tab.hheader file, #includethat file in

    other modules 7in the scanner. for e0ample8 to stay in sync with the parser*

    generated definitions,

    !roductions and their accompanying actions are specified using the following

    format>

    pg. 13

  • 7/23/2019 yacc seminar report.doc

    14/26

    Yacc

    left)side , rig!t)side9 {

    action9 } 3 rig!t)side5

    { action5 }

    3 rig!t)side? { action? }

    The left side is the non*terminal $eing descri$ed, on*terminals are named like

    typical identifiers> using a se=uence of letters andor digits. starting with a letter,

    onterminals do not ha/e to $e defined $efore use. $ut they do need to $e

    defined e/entually, The first non*terminal defined in the file is assumed to the

    start sym$ol for the grammar,

    Each right side is a /alid e0pansion for the left side non*terminal, Fertical $ars

    separate the alternati/es. and the entire list is punctuated $y a semicolon, Each

    right side is a list of grammar sym$ols. separated $y white space, The sym$ols

    can either $e nonterminals or terminals, Terminal sym$ols can either $e

    indi/idual character constants. e,g, !a! or token codes defined using %token

    such as T_"hile,

    The C code enclosed in curly $races after each right side is the associated action,

    hene/er the parser recogni9es that production and is ready to reduce. it e0ecutes

    the action to process it, hen the entire right side is assem$led on top of the parse

    stack and the lookahead indicates a reduce is appropriate. the associated action is

    e0ecuted right $efore popping the handle off the stack and following the goto for

    the left side non*terminal, The code you include in the actions depends on what

    processing you need, The action might $e to $uild a section of the parse tree.pg. 14

  • 7/23/2019 yacc seminar report.doc

    15/26

    Yacc

    e/aluate an arithmetic e0pression. declare a /aria$le. or generate code to e0ecute

    an assignment statement,

    Although it is most common for actions to appear at the end of the right side. it is

    also possi$le to place actions in*$etween grammar sym$ols, Those actions will $e

    e0ecuted at that point when the sym$ols to the left are on the stack and the sym$ols

    to the right are coming up, These em$edded actions can $e a little tricky $ecause

    they re=uire the parser to commit to the current production early. more like a

    predicti/e parser. and can introduce conflicts if there are still open alternati/es at

    that point in the parse,

    $) S-m8ol Attri8utes

    The parser allows you to associate attri$utes with each grammar sym$ol. $oth

    terminals and non*terminals, 6or terminals. the glo$al /aria$le yylvalis used to

    communicate the particulars a$out the token -ust scanned from the scanner to the

    parser, 6or nonterminals. you can e0plicitly access and set their attri$utes using

    the attri$ute stack,

    By default. $T& 7the attri$ute type8 is -ust an integer, ?sually you want to

    different information for /arious sym$ol types. so a union type can $e used

    instead, You indicate what fields you want in the union /ia the %uniondirecti/e,

    %union { int

    int@alue

    double

    double@aluepg. 15

  • 7/23/2019 yacc seminar report.doc

    16/26

    Yacc

    c!ar

    6string@alue

    }

    The a$o/e line would $e included in the definitions section, It is translated $y

    yacc into C code as a $T&typedeffor a new union type with the a$o/e

    fields, The glo$al /aria$le yylvalis of this type. and parser stores /aria$les of

    this type on the parser stack. one for each sym$ol currently on the stack,

    hen defining each token. you can identify which field of the union is applica$le

    to this token type $y preceding the token name with the fieldname enclosed in

    angle $rackets, The fieldname is optional 7for e0ample. it is not rele/ant for tokens

    without attri$utes8,

    %to(en int@alue"&)*nt double@alue"&)Double &)>!ile &)Aeturn

    To set the attri$ute for a non*terminal. use the %type directi/e. also in the

    definitions section, This esta$lishes which field of the union is applica$le to

    instances of the nonterminal>

    %typeint@alue" *nteger *ntxpression

    To access a grammar sym$ol@s attri$ute from the parse stack. there are special

    /aria$les a/aila$le for the C code within an action, 'nis the attri$ute for the nth

    sym$ol of the current right side. counting from (for the first sym$ol, The attri$ute

    for the nonterminal on the left side is denoted $y '', If you ha/e set the type of the

    token or nonterminal. then it is clear which field of the attri$utes union you are

    accessing, If you ha/e not set the type 7or you want to o/errule the defined field8.pg. 16

  • 7/23/2019 yacc seminar report.doc

    17/26

    Yacc

    you can specify with the notation ')fieldna*e+n, A typical use of attri$utes in an

    action might $e to gather the attri$utes of the /arious sym$ols on the right side and

    use that information to set the attri$ute for the non*terminal on the left side,

    A similar mechanism is used to o$tain information a$out sym$ol locations, 6or

    each sym$ol on the stack. the parser maintains a /aria$le of type ,T&. which is

    a structure containing four mem$ers> first line. first column. last line. and last

    column, To o$tain the location of a grammar sym$ol on the right side. you simply

    use the notation Bn. completely parallel to 'n, The location of a terminal sym$ol is

    furnished $y the le0ical analy9er /ia the glo$al /aria$le yylloc, 1uring a

    reduction. the location of the non*terminal on the left side is automatically set

    using the com$ined location of all sym$ols in the handle that is $eing reduced,

    ) Confli*t Resolution

    hat happens when you feed yacca grammar that is not "A"#7(8G yaccreports

    any conflicts when trying to fill in the ta$le. $ut rather than -ust throwing up its

    hands. it has automatic rules for resol/ing the conflicts and $uilding a ta$le

    anyway, 6or a shiftreduce conflict. yacc will choose the shift, In a

    reducereduce conflict. it will reduce using the rule declared first in the file,

    These heuristics can change the language that is accepted and may not $e what

    you want, E/en if it happens to work out. it is not recommended to let yaccpick

    for you, You should control what happens $y e0plicitly declaring precedence and

    associati/ity for your operators,

    pg. 17

  • 7/23/2019 yacc seminar report.doc

    18/26

    Yacc

    6or e0ample. ask yacc to generate a parser for this am$iguous e0pression

    grammar that includes addition. multiplication. and e0ponentiation 7using !-!8,

    %to(en &)*nt

    %%

    , 4

    3 6

    3 C

    3 &)*nt

    hen you run yacc on this file. it reports>

    conflicts, = s!ift7reduce

    In the generated y.output. it tells you more a$out the issue>

    ...

    +tate contains ? s!ift7reduce conflicts.

    +tate E contains ? s!ift7reduce conflicts.

    +tate contains ? s!ift7reduce

    conflicts. ...

    If you look through the human*reada$le y.outputfile. you will see it contains the

    family of configurating sets and the transitions $etween them, hen you look at

    states +. 2. and H. you will see the place we are in the parse and the details of the

    conflict, ?nderstanding all that "#7(8 construction stuff -ust might $e useful after

    all< #ather than re*writing the grammar to implicitly control the precedence and

    associati/ity with a $unch of intermediate non*terminals. we can directly indicate

    pg. 18

  • 7/23/2019 yacc seminar report.doc

    19/26

    Yacc

    the precedence so that yaccwill know how to $reak ties, In the definitions section.

    we can add any num$er of precedence le/els. one per line. from lowest to highest.

    and indicate the associati/ity 7either left. right. or nonassoc8, Se/eral terminals can

    $e on the same line to assign them e=ual precedence,

    %to(en &)*nt

    %left 4

    %left 6

    %rig!t C

    %%

    , 4

    3 6

    3 C

    3 &)*nt

    The a$o/e file says that addition has the lowest precedence and it associates left to

    right, ultiplication is higher. and is also left associati/e, E0ponentiation is

    highest precedence and it associates right, These directi/es disam$iguate the

    conflicts, hen we feed yacc the changed file. it uses the precedence and

    associati/ity as specified to $reak ties, 6or a shiftreduce conflict. if the

    precedence of the token to $e shifted is higher than that of the rule to reduce. it

    chooses the shift and /ice /ersa, The precedence of a rule is determined $y the

    precedence of the rightmost terminal on the right*hand side 7or can $e e0plicitly set

    with the %precdirecti/e8, So if a /5is on the stack and 0is coming up. the 0

    has higher precedence than the /5. so it shifts, If 05is on the stack and /ispg. 19

  • 7/23/2019 yacc seminar report.doc

    20/26

    Yacc

    coming up. it reduces, If /5is on the stack and /is coming up. the associati/ity

    $reaks the tie. a left*to*right associati/ity would reduce the rule and then go on. a

    right*to*left would shift and postpone the reduction,

    Another way to set precedence is $y using the %precdirecti/e, hen placed at

    the end of a production with a terminal sym$ol as its argument. it e0plicitly sets

    the precedence of the production to the same precedence as that terminal, This

    can $e used when the right side has no terminals or when you want to o/errule the

    precedence gi/en $y the rightmost terminal,

    E/en though it doesn;t seem like a precedence pro$lem. the dangling else

    am$iguity can $e resol/ed using precedence rules, Think carefully a$out what the

    conflict is> Identify what the token is that could $e shifted and the alternate

    production that could $e reduced, hat would $e the effect of choosing the shiftG

    hat is the effect of choosing to reduceG hich is the one we wantG

    ?sing yacc@s precedence rules. you can force the choice you want $y setting the

    precedence of the token $eing shifted /ersus the precedence of the rule $eing

    reduced, hiche/er precedence is higher wins out, The precedence of the token

    is set using the ordinary %left. %ri1ht. or %nonassoc directi/es, The

    precedence of the rule $eing reduced is determined $y the precedence of the

    rightmost terminal 7set the same way8 or /ia an e0plicit %precdirecti/e on the

    production,

    pg. 20

  • 7/23/2019 yacc seminar report.doc

    21/26

    Yacc

    ') ,rror handling

    hen a yacc*generated parser encounters an error 7i,e, the ne0t input token

    cannot $e shifted gi/en the se=uence of things so far on the stack8. it calls the

    default yyerror routine to print a generic :parse error: message and halt

    parsing, 3owe/er. =uitting in response to the /ery first error is not particularly

    helpful

    @ar , Godifiers &ype *dentifierHist

    3 error

    pg. 21

  • 7/23/2019 yacc seminar report.doc

    22/26

    Yacc

    The second production allows for handling errors encountered when trying to

    recogni9ear$y accepting the alternate se=uence errorfollowed $y a semicolon,

    hat happens if the parser in the in the middle of processing the Identifier,ist

    when it encounters an errorG The error reco/ery rule. interpreted strictly. applies

    to the precise se=uence of an error and a semicolon, If an error occurs in the

    middle of an Identifier,ist. there areModifiersand a Typeand what not on

    the stack. which doesn@t seem to fit the pattern, 3owe/er. yacc can force the

    situation to fit the rule. $y discarding the partially processed rules 7i,e, popping

    states from the stack8 until it gets $ack to a state in which the error token is

    accepta$le 7i,e, all the way $ack to the state at which it started $efore matching the

    Modifiers8, At this point the errortoken can $e shifted, Then. if the lookahead

    token couldn;t possi$ly $e shifted. the parser reads tokens. discarding them until it

    finds a token that is accepta$le, In this e0ample. yacc reads and discards input

    until the ne0t semi*colon so that the error rule can apply, It then reduces that to

    ar. and continues on from there, This $asically allowed for a mangled /aria$le

    declaration to $e ignored up to the terminating semicolon. which was a reasona$le

    attempt at getting the parser $ack on track, ote that if a specific token follows the

    error sym$ol it will discard until it finds that token. otherwise it will discard until

    it finds any token in the lookahead set for the nonterminal 7for e0ample. anything

    that can followarin the e0ample a$o/e8,

    pg. 22

  • 7/23/2019 yacc seminar report.doc

    23/26

    Yacc

    In our postfi0 calculator from the $eginning of the handout. we can add an error

    production to reco/er from a malformed e0pression $y discarding the rest of the

    line up to the newline and allow the ne0t line to $e parsed normally>

    + , + /n { printf01 %d/n02 &op$$ }

    3 error /n { printf0rrorI Discarding entire line./n0$ }

    3

    "ike any other yacc rule. one that contains an error can ha/e an associated

    action, It would $e typical at this point to clean up after the error and other

    necessary housekeeping so that the parse can resume,

    here should one put error productionsG It;s more of an art than a science,

    !utting error tokens fairly high up tends to protect you from all sorts of errors

    $y ensuring there is always a rule to which the parser can reco/er, On the other

    hand. you usually want to discard as little input as possi$le $efore reco/ering.

    and error tokens at lower le/el rules can help minimi9e the num$er of partially

    matched rules that are discarded during reco/ery,

    One reasona$le strategy is to add error tokens fairly high up and use punctuation as

    the synchroni9ing tokens, 6or e0ample. in a programming language e0pecting a

    list of declarations. it might make sense to allow error as one of the alternati/es for

    a list entry, If punctuation separates elements of a list. you can use that punctuation

    in error rules to help find synchroni9ation pointsDskipping ahead to the ne0t

    parameter or the ne0t statement in a list, Trying to add error actions at too low a

    pg. 23

  • 7/23/2019 yacc seminar report.doc

    24/26

    Yacc

    le/el 7say in e0pression handling8 tends to $e more difficult $ecause of the lack of

    structure that allows you to determine when the e0pression ends and where to pick

    $ack up, Sometimes adding error actions into more than one le/el may introduce

    conflicts into the grammar $ecause more than one error action is a possi$le

    reco/ery,

    Another mechanism> deli$erately add incorrect productions into the grammar

    specification. allow the parser to help you recogni9e these illegal forms. and then

    use the action to handle the error manually, 6or e0ample. let@s say you@re writing a

    a/a compiler and you know some poor C programmer is going to forget that you

    can@t specify the si9e in a a/a array declaration, You could add an alternate array

    declaration that allows for a si9e. $ut knowing it was illegal. the action for this

    production reports an error and discards the erroneous si9e token,

    ost compilers tend to focus their efforts on trying to reco/er from the most

    common error situations 7forgetting a semicolon. mistyping a keyword8. $ut don@t

    put a lot of effort into dealing with more wacky inputs, Error reco/ery is largely a

    trial and error process, 6or fun. you can e0periment with your fa/orite compilers

    to see how good a -o$ they do at recogni9ing errors 7they are e0pected to hit this

    one perfectly8. reporting errors clearly 7less than perfect8. and gracefully reco/ering

    7far from perfect8,

    pg. 24

  • 7/23/2019 yacc seminar report.doc

    25/26

    Yacc

    2) Other 3eatures

    This handout doesn@t detail all the great features a/aila$le in yacc and $ison. please

    refer to the man pages and online references for more details on tokens and types.

    conflict resolution. sym$ol attri$utes. em$edded actions. error handling. and so on,

    pg. 25

  • 7/23/2019 yacc seminar report.doc

    26/26

    Yacc

    Bi8liograph-

    , "e/ine. T, ason. and 1, Brown. "e0 and Yacc, Se$astopol. CA> O;#eilly &

    Associates. (JJ,

    !yster. Compiler 1esign and Construction, ew York. Y> Fan ostrand

    #einhold. (JHH,

    pg. 26