Introduction Lex

download Introduction Lex

of 21

Transcript of Introduction Lex

  • 8/13/2019 Introduction Lex

    1/21

    Prin

    ciplesofCompilers

    -2/03/2006

    PabloR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Introduction to Lex

    General Description Input file Output file How matching is done Regular expressions Local names Using Lex

  • 8/13/2019 Introduction Lex

    2/21

    Prin

    ciplesofCompilers

    -2/03/2006

    PabloR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    General Description

    Lex is a program that automaticallygenerates code for scanners.

    Input: a description of the tokens in theform of regular expressions, together

    with the actions to be taken when eachexpression is matched.

    Output: a text file with C source codedefining a procedure yylex()that is a tableimplementing the DFA for the regular

    expressions.

  • 8/13/2019 Introduction Lex

    3/21

    Prin

    ciplesofCompilers

    -2/03/2006

    PabloR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Input File

    Lex input file is divided in threeparts

    /* declarations*/

    %%

    /* rules */

    %%

    /* auxiliary functions*/

  • 8/13/2019 Introduction Lex

    4/21

    Prin

    ciplesofCompilers

    -2/03/2006

    PabloR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Input File Declaration

    The declaration part includes theassignment of names to regularexpressions in the form:

    It can also include C code external to thedefinition of yylex()within %{and %}inthe first column.

    Also, it is possible to specify someoptions with the sintax %option

  • 8/13/2019 Introduction Lex

    5/21

    Prin

    ciplesofCompilers

    -2/03/2006

    PabloR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Input File - Rules

    The rules part specifies what to dowhen a regular expression ismatched

    Actions are normal C sentences(can be a complex C sentencebetween {}).

    %{ %} enclosed text appearingbefore the first rule may be used todeclare local variables.

  • 8/13/2019 Introduction Lex

    6/21

    Prin

    ciplesofCompilers

    -2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Input File Aux Functions

    The auxiliary functions part is onlyC code.

    It includes function definitions forevery function needed in the rulepart

    It can also contain the main()function if the scanner is going tobe used as a standalone program.

    The main()function must call thefunction yylex()

  • 8/13/2019 Introduction Lex

    7/21

    Prin

    ciplesofCompilers

    -2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Code Generated by Lex

    The output of Lex is a file calledlex.yy.c

    It is a C function that returns aninteger, i.e., a code for a Token

    If it contains the main()functiondefinition, it must be compiled torun.

    Otherwise, the code can be anexternal function declaration for thefunctionint yylex()

  • 8/13/2019 Introduction Lex

    8/21

    Prin

    ciplesofCompilers

    -2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    How matching is done

    By running the generated scanner, itanalyses its input looking for strings thatmatch any of its patterns, and thenexecutes the action.

    If more than one match is found, it selectsthe regular expression matching thelongest string.

    If it finds two or more matches of the samelength, the one listed first is selected.

    If no match is found, then the default ruleis executed: i.e., the next character in theinput is copied to the output.

  • 8/13/2019 Introduction Lex

    9/21

    Prin

    ciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Regular expressions ::= a the character a ::= sthe string s, even if s contains metacharacters

    ::=\a the character a when a is a metacharacter (e.g., *) ::= . any character except newline ::=[+] any of the character ::= [-]any character from to

    ::= [^+] any character except those ::= * zero or more repetitions of ::= + one or more repetitions of ::= ? zero or one repetitions of ::= | or ::= followed by

    ::= ()same as ::= {} the named regular expression in the

    definitions part

  • 8/13/2019 Introduction Lex

    10/21

    Prin

    ciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Internal names

    The rules, inside the actiondefinition, can refer to the followingvariables: yytext, the string being matched

    (lexeme)

    yyin, the input file yyout, the output file ECHO, the default rule action yyval, the global variable for

    communicating the Attribute for aToken to the Parser

  • 8/13/2019 Introduction Lex

    11/21

    Prin

    ciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Using Lex

    There are several lex versions.We're going to use flex.

    In order to maximize compatibility,use -loption when compiling, and%option noyywrapin the definitionpart of the Lex input file.

  • 8/13/2019 Introduction Lex

    12/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 1

    %{#include %}

    %%[0-9]+ { printf("%s\n", yytext); }.|\n ;

    %%

    main(){yylex();

    }

  • 8/13/2019 Introduction Lex

    13/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 2

    %{int c=0, w=0, l=0;

    %}word [^ \t\n]+

    eol \n

    %%{word} {w++; c+=yyleng;};

    {eol} {c++; l++;}. {c++;}

    %%

    main(){yylex();printf("%d %d %d\n", l, w, c);

    }

  • 8/13/2019 Introduction Lex

    14/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 3

    %{int tokenCount=0;

    %}

    %%[a-zA-Z]+ { printf("%d WORD \"%s\"\n",

    ++tokenCount, yytext); }[0-9]+ { printf("%d NUMBER \"%s\"\n",

    ++tokenCount, yytext); }[^a-zA-Z0-9]+ { printf("%d OTHER \"%s\"\n",

    ++tokenCount, yytext); }

    %%main() { yylex(); }

  • 8/13/2019 Introduction Lex

    15/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 4

    %{#include

    int lineno =1;

    %}

    line .*\n

    %%

    {line} {printf("%5d %s", lineno++, yytext); }

    %%int main() {

    yylex(); return 0; }

  • 8/13/2019 Introduction Lex

    16/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 5

    %{#include

    %}

    comment_line \/\/.*\n

    %%{comment_line} { printf("%s\n", yytext); }

    .*\n ;%%

    int main() {

    yylex(); return 0;

    }

  • 8/13/2019 Introduction Lex

    17/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 7

    %{

    #include

    %}digit [0-9]

    number {digit}+%%

    {number} { int n = atoi(yytext);printf("%x", n); }

    . {;}%%

    int main() {

    yylex(); return 0;}

  • 8/13/2019 Introduction Lex

    18/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 8 (1/4)%{

    #include "globals.h"

    #include "util.h"

    #include "scan.h"int lineno=0;FILE *listing; // used to output source code listing

    FILE *code; // used to output assembly codeFILE *source; // used to input tiny program source

    code

    int TraceScan = 1;int EchoSource = 1;

    /* lexeme of identifier or reserved word */

    char tokenString[MAXTOKENLEN+1];

    %}

    digit [0-9]number {digit}+

    letter [a-zA-Z]

    identifier {letter}+

    newline \nwhitespace [ \t]+

  • 8/13/2019 Introduction Lex

    19/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 8 (2/4)%%"if" {return IF;}

    "then" {return THEN;}"else" {return ELSE;}"end" {return END;}

    "repeat" {return REPEAT;}"until" {return UNTIL;}

    "read" {return READ;}"write" {return WRITE;}

    ":=" {return ASSIGN;}"=" {return EQ;}"

  • 8/13/2019 Introduction Lex

    20/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 8 (3/4)

    {number} {return NUM;}{identifier} {return ID;}

    {newline} {lineno++;}

    {whitespace} {/* skip whitespace */}

    "{" { char c;do{ c = input();

    if (c == '\n') lineno++;

    } while (c != '}');}

  • 8/13/2019 Introduction Lex

    21/21

    PrinciplesofCompilers-2/03/2006

    Pab

    loR.

    Fillottrani

    FREEUNIVERSITY OF

    BOZENBOLZANOFaculty of

    Computer Science

    Example 8 (4/4)%%TokenType getToken(void)

    { static int firstTime = TRUE;TokenType currentToken;

    if (firstTime)

    { firstTime = FALSE;lineno++;

    yyin = source=stdin;

    yyout = listing=stdout; }

    currentToken = (TokenType)yylex();strncpy(tokenString,yytext,MAXTOKENLEN);if (TraceScan) {

    fprintf(listing,"\t%d: ",lineno);

    printToken(currentToken,tokenString);

    }

    return currentToken; }

    int main(){

    TraceScan = TRUE;

    while( getToken() != ENDFILE);return 0; }