The Role of Lexical Analyzer Compiler Design Lexical Analysis.

The Role of Lexical The Role of Lexical AnalyzerAnalyzerCompiler Design Lexical Analysis

OutlineOutlineLexical Analysis vs. ParsingTokens, Patterns and LexemesAttributes for TokensLexical Errors

Lexical AnalysisLexical AnalysisManual approach – by hand

◦ To identify the occurrence of each lexeme◦ To return the information about the identified

tokenAutomatic approach - lexical-analyzer

generator◦ Compiles lexeme patterns into code that

functions as a lexical analyzer◦ e.g. Lex, Flex, …◦ Steps

Regular expressions - notation for lexeme patterns Nondeterministic automata Deterministic automata Driver - code which simulates automata

The Role of the Lexical The Role of the Lexical AnalyzerAnalyzerRead input charactersTo group them into lexemesProduce as output a sequence of

tokens◦input for the syntactical analyzer

Interact with the symbol table◦Insert identifiers

The Role of the Lexical The Role of the Lexical AnalyzerAnalyzerto strip out

◦comments◦whitespaces: blank, newline, tab, …◦other separators

to correlate error messages generated by the compiler with the source program◦to keep track of the number of newlines

seen◦to associate a line number with each

error message

Lexical Analyzer Lexical Analyzer ProcessesProcessesScanning

◦to not require input tokenization◦deletion of comments◦compaction of consecutive white

spaces into oneLexical analysis

◦to produce sequence of tokens as output

Lexical Analysis vs. Lexical Analysis vs. ParsingParsingSimplicity of design

◦ Separation of lexical from syntactical analysis -> simplify at least one of the tasks

◦ e.g. parser dealing with white spaces -> complex

◦ Cleaner overall language designImproved compiler efficiency

◦ Liberty to apply specialized techniques that serves only lexical tasks, not the whole parsing

◦ Speedup reading input characters using specialized buffering techniques

Enhanced compiler portability◦ Input device peculiarities are restricted to

the lexical analyzer

Tokens, Patterns, Tokens, Patterns, LexemesLexemes Token - pair of:

◦ token name – abstract symbol representing a kind of lexical unit keyword, identifier, …

◦ optional attribute value Pattern

◦ description of the form that the lexeme of a token may take

◦ e.g. for a keyword the pattern is the character sequence

forming that keyword for identifiers the pattern is a complex structure that is

matched by many strings Lexeme

◦ a sequence of characters in the source program matching a pattern for a token

Examples of TokensExamples of TokensToken Informal Description Sample Lexemes

if characters i, f if

else characters e, l, s, e else

comparison < or > or <= or >= or == or !=

<=, !=

id Letter followed by letters and digits

pi, score, D2

number Any numeric constant 3.14159, 0, 02e23

literal Anything but “, surrounded by “

“core dumped”

Examples of TokensExamples of TokensOne token for each keyword

◦Keyword pattern = keyword itselfTokens for operators

◦Individually or in classesOne token for all identifiersOne or more tokens for constants

◦Numbers, literal stringsTokens for each punctuation

symbol◦( ) , ;

Attributes for TokensAttributes for Tokensmore than one lexeme can match a

patterntoken number matches 0, 1, 100, 77,…lexical analyzer must return

◦ Not only the token name◦ Also an attribute value describing the lexeme

represented by the tokentoken id may have associated information

like◦ lexeme◦ type◦ location – in order to issue error messages

token id attribute◦ pointer to the symbol table for that identifier

Tricky Problems in Token Tricky Problems in Token RecognitionRecognitionFortran 90 example

◦assignment

DO 5 I = 1.25DO5I = 1.25

◦do loop

DO 5 I = 1,25

Example of Attribute Example of Attribute ValuesValuesE = M * C ** 2

◦<id, pointer to symbol table entry for E>◦<assign_op>◦<id, pointer to symbol-table entry for

M>◦<mult_op>◦<id, pointer to symbol-table entry for

C>◦<exp_op>◦<number, integer value 2>

Lexical ErrorsLexical Errorscan not be detected by the lexical

analyzer alone◦fi (a == f(x) ) …

lexical analyzer is unable to proceed ◦none of the patterns matches any prefix

of the remaining input◦“panic mode” recovery strategy

delete one/successive characters from the remaining input

insert a missing character into the remaining input

replace a character transpose two adjacent characters

OutlineOutlineInput Buffering

◦Buffer Pairs◦Sentinels

Input BufferingInput BufferingHow to speed the reading of source

program ?to look one additional character aheade.g.

◦ to see the end of an identifier you must see a character which is not a letter or a digit not a part of the lexeme for id

◦ in C - ,= , < ->, ==, <=

two buffer scheme that handles large lookaheads safely

sentinels – improvement which saves time checking buffer ends

Buffer PairsBuffer PairsBuffer size NN = size of a disk block (4096)read N characters into a buffer

◦one system call◦not one call per character

read < N characters we encounter eof

two pointers to the input are maintained◦ lexemeBegin – marks the beginning of

the current lexeme◦ forward – scans ahead until a pattern

match is found

SentinelsSentinelsforward pointer

◦to test if it is at the end of the buffer◦to determine what character is read

(multiway branch)sentinel

◦added at each buffer end◦can not be part of the source program◦character eof is a natural choice

retains the role of entire input end when appears other than at the end of a

buffer it means that the input is at an end

Lookahead Code with Lookahead Code with SentinelsSentinelsswitch(*forward++)

case eof:

if(forward is at the end of the first buffer)

reload second buffer;

forward = beginning of the second buffer;

elseif(forward is at the end of the second buffer)

reload first buffer;

forward = beginning the first buffer;

/* eof within a buffer marks the end of input */

terminate lexical analysis;

break;

cases for the other characters

Potential ProblemsPotential Problemsrunning out of buffer space

◦N ~ 4 x 1000◦long character strings > N◦concatenation of components (Java +

operator)long lookahead

◦languages where keywords are not reserved◦in PL/I: DECLARE (ARG1, ARG2,…)◦ambiguous identifier resolved by the parser

keyword procedure name

BibliographyBibliographyAlfred V. Aho, Monica S. Lam,

Ravi Sethi, Jeffrey D. Ullman – Compilers, Principles, Tehcniques and Tools, Second Edition, 2007

The Role of Lexical Analyzer Compiler Design Lexical Analysis.

Documents

Transcript of The Role of Lexical Analyzer Compiler Design Lexical Analysis.

COMP-421 Compiler Design · Three approaches to the implementation of a lexical analyzer – Use a lexical-analyzer generator – Write a lexical analyzer in a systems programming

Lexical Analyzer (Checker). 2 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.

1 Lexical Analysis. 2 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer.

Lexical Analyzer

CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical Analyzer 或 Scanner) Read the source program one character.

CS416 Compiler Design - SJTUjiangli/teaching/CS308/CS308-slides02.pdf• Lexical Analyzer reads the source program character by character to produce tokens. ... Compiler Principles

UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical

Lexical Analyzer - CSCI4160: Compiler Design and Software ...

Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.

UNIT I INTRODUCTION TO COMPILING 2. 3. · UNIT I – INTRODUCTION TO COMPILING 1. Define compiler? ... Use a lexical analyzer generator, such as Lex compiler, to produ e the lexical

Lexical Analyzer (Checker)

1 Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Compiler Construction Lexical Analysis

1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.

Parsing Compiler Baojian Hua bjhua@ustc.edu.cn. Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.

COMPILER DESIGN LAB MANUAL - WordPress.com · 2017-09-19 · COMPILER DESIGN LAB SYLLABUS SL. No. Experiment Name Page No. 1 Design a lexical analyzer for given language and the lexical

Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer.

1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.

Lexical Analysis/Scanningcse.iitkgp.ac.in/~goutam/IIITKalyani/compiler/lect/Lect2.pdf · Lexical Analysis/Scanning Lect 2 GoutamBiswas. ... •A scanner or lexical analyzer of a language,

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens.