· ABSTRACT One approach ro implemenring panen in a purely funaional programming languge is to...

NOTE TO USERS

The original manuscript received by UMI contains indistinct, slanted and or light print. All efforts were made to acquire the highest quality manuscript from the author or school.

Microfilmed as received.

This reproduction is the best copy available

COMPLEXITY ANALYSIS AND MONADIC SPECIFICATION OF

MEMOIZED FUNCTIONAL PARSERS

by

Barbara Szydlowski

A Thesis Submitted to the Faculty of Gnduw Studies and Research

through the School of Computer Science in Partial Fdfihent of the Reqirements for the Degree of

Maser of Science at the University of Windsor

Winbor, Ontario, Canada 1996

Naüonal Library Bibliothéque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services senrices bibliographiques

395 welrigton street 395, ~e we~~uigtorr OttawaON K 1 A W OttawaON K1A ON4 Canada CaMda

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distriibute or seIl reproduire, prêter' distribuer ou copies of this thesis in microform, vendre des copies de cette îhèse sous paper or electronic formats. la forme de microfiche/lnlm, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

Barbara Szydlowski 1996 @ AU Rights Reserved

ABSTRACT

One approach ro implemenring panen in a purely funaional

programming languge is to mode1 them as funaions, and ro define a set of

higherorder combinaton that d o w one to build larger paners out of

smaller components. These combkors implement grammar constructions

such as alternation or sequencing. They can be used to constnict parsen with

stnichues resembling the BNF notation of the grammars of the languages

being p r o d . Such parsers are modular and easy to modify and

undersund A major disadvantage of this approach is that the resulting

pvsen use topdown fully badrtracking strategy that may led to enormous

Ume and space requirements.

The effiaency of parsen can be improved by adding bookkeeping

features that eliminate unnemsvy backtracking. In this thesis we investigare

a technique cded memoization. A memoUed paner cornputes irs r d t

based on previously computed r d t s that have been stored in a memo-table.

A paner is a program that determines the syntactic structure of an input

~ e ~ u e n c e of symbols in some language. It may produce some kind of abanct

syntvr tree as output. We consider the simplest type of paners - language

rezognizers thu can be thought of as progruns detennining only if the input

seQuence belongs to a given language. We show that mernoid recog-n

constructed for an arbitrary grammar have 0(n3) M e complexity where n is

the length of the input to be p r d The space r e q d to store the

memo table is (at most) 0(n3). In purely funaional progrunming 1anguage.s

that support updateable in-place variables the space requirements could be

reduced to 0(n2).

Monads, whidi are abntacr structures from Category Theory, have

proven usefil for addressing many cornputational problerns in purely

funaionai programming. The monadic approach allows one to build basic

parsers and combhators out of components that represent various

programming Ianguage features such as state, exceptions, or non-

determinlm. These features automatidy becorne the chacacterisucs of the

r d t i n g paner. We show how munoGed recognizers could be implemented

in a fully modular way using the monadic approach. We ais0 describe how

the technique could be extended to improve effiaency of more complex

language processon, nich as syntax-directed evduaton.

ACKNO WLEDGMENTS

The author wishes to express sincere appreciation to Dr. Richard Froa

for helpM discussions, support, and his assistance in the preparation of this

manuscript. In addition, special th& to Dr. Yung Tsin and Dr. Richard

Caron for their constructive comments on the fina drafi of this thesis.

TABLE OF CONTENTS

ABSTRACT

TABLE OF CON'I'ENTS

LIST OF FIGURES

INTRODUCTION

1 CONSTRUCTING PURELY FUNCTiONAL LANGUAGE PROCESSORS

2 THE TiPE OF PARSERS

3 BNF NOTA- AND FUNCTIONAL LANGUAGE PROCESSORS

3.1 &rcK p a m

3.2 Combimms

4 NON-DETERMINISIIC LANGUAGE PROCESSORS

7 MONADS AND FWNCTIONAL PROGRAMMNG

8 MONADIC CONSTRUCTION OF LANGUAGE PROCESSORS

vii

INTRODUCTION TO CATEGORY THEORY AND

MONADS

MONADIC CONSTRUCTION OF PURELY

FUNCTIONAL RECOGNIZERS

2 TYPE OF THE MONADIC RECOGNCLERS

3 BASIC MONADS

3.1 Ihe idartity monad

3.2 Exceptions

3.3 Norr-detmininn

4 COMBIMNG MONADS

MONADIC CONSTRUCTION OF MEMOIZED

LANGUAGE PROCESSORS USING TYPE

CONSTRUCTOR CLASSES IN GOFER

1 THE SYSTEM OF TYPE CONSTRUCTOR CLASSES IN ~ F E R

2 MONADS AND TYPE CONSTRUmOR CIASSES

4.1 nestate monad

4.2 The parametrized list m a a d

4.3 n e rnemo-table

5 PARSERS

5.1 n e type of tbe m d i c p a m

5.2 The c h Paner

1.3 The pararnetrized input m d

S. 4 Memoized paners

6 C0MPLEXnr OF MEMOEED LANGUAGE PROCESSOM

CHAPTER 6

APPENDIX A

MEMOEING PURELY-FUNCTIONAL TOP-DOWN

BACKTRACEUNG W G U A G E PROCESSORS

1 CONSTRUCITNG MODULAR NON-DETERMlMSnC LANGUAGE

PROCESSORS IN FUNCTIONAL PROGlUhdMlNG LANGUAGES

2 M E M O ~ G LANGUAGE PROCESSORS

3 MEMoIZATXON IN PURELY-FUNCTIONAL LANGUAGES

4 MEMoEING PUPY-FUNCTIONAL RECOGNtZERS

4.1 ?;be rnemo-table

4.2 The memoized recognizm

4.3 lhe Aigontbm

5 CoMPLExrrY ANALYSIS

5.1 Ehentary ~ u r i o n s

5.2 B e site of the memo-table

5.3 Mmetable lookup and @

5.4 recognizers

5.5 Aitwnatim

5 6 Sequmcing

5.7 Meging 7& r e t u d when a recognker is a&isd 80 a lin of s*rrt

positim 96

5.8 ne txmtim tree 97

5.9 Proof of qd) time compkxty 98

6 A MONADIC APPROACH TO INCORPORATE MEMOIZAnON 101

6.1 Manadr 102

6.2 Non-memoîzed m& rerognizers 104

xi

63 Memoized m d i c mognizers 106

7 M E M O ~ G PARSERS AND SYNTAX-DIRE= EVALUATORS 109

DMPIEMENTATION OF MONADIC M G U A G E

PROCESSORS USING TYPE CONSTRUCTOR CLASSES

IN GOFER 115

VITA AUCTORIS 125

LIST OF FIGURES

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Figure 17

Figure 18

Figure 19

Figure 20

Figure 21

Basic parsen 4

Basic recogn;Zers

Example definitions of orelse and then

Example diagram

Funcror F mapping category C into D

Natural transformation mapping funaor F into functor G 20

Monad laws 30

List monad 31

Kleisli triple laws 32

Kleisli triple for 1ist.s 33

The type of monadic recogaUers 37

The idenUty monad 38

The exception monad wirh zero and p l u

The List monad with zero and pius

The dus Recopn;Zer

The state monad

The ~vametrlled lk monad

The class Paner

The parametrized input monad

Figure 22 A functional progrun containing a definicion of the

Figure 23 The reiationship h e e n the gnmmar and the prognm

implementing the recognizer

Figure 24 Definition of the Fibonacci funcrion

Figure 25 A memoized version of the Fibonacci funcrion

Figure 26 The relationship b e n a recognizer and iû memoized

version

C h a p r e r I

INTRODUCTION

1 Constnicting purely functional language processors

One approach to implementing language processors in a purely

functional prognmming laquage is to mode1 them as fuoctions, and to

define a set of higher-order combinaton that allow one to build lvger

parsen out of s m d u componenu. This approd dates back to Burge's

book on renirsive programming techniques [BR75], and it has been

popdarized in hinaional programming by Wader NA851, Frost m92],

Hutton w 9 2 ] , and others. Accordkg to the approach, a parser is a

program that taka a string of tokens as input and yields some kind of an

absuui t r e , that describes the grammatical structure of the srring, as result.

Owing to the fact h t a parser might not consume dl of the input

string, it is convenient to represent parsen as h a i o n s that when applied to

the input string of tokens return a pak d u e (an abstract tree) and the

unconsumed part of the input. Furthemore, a parser might fail on its input.

One way of distinpuishing b e e n niurss and failure is to have parsers

muni a lin of pairs rather t h a single pair, with the convention that a

singleton List denotes sucus and an empty iist denotes f h e wA85].

We sur by definiog the type for parsers and recognizen. Next, we

define basic parsea and combinaton. The implemenration language for this

thesis is Gofer [TN%], a purely funaional programming language that

nippons many useful features, such as lambda expressions and type

constmctor classes IJNA931. These features will be dixussed larer in this

thesis.

2 The type of parsers

Using the conventions dexribed in the previous section we defme Our

parsers to be of the type:

type Parser a = String - > [(a, ~tring)l.

That is, a parser of the type Parser a is a funaion from the input string of

characten to the list of pairs (value-of-type-a, rest-of-the-string).

A recognizer is a language processor that simply determines whether

or not die input string of c h a ~ e n belone to a defined language. Using the

svne pr-ples as for parsers, we define our recognUers to be functions from

the input string of charaaers to the lin containing an unconsumeci part of

the input* A singleton lisc of results denotes sums; an empty lin of results

denotes failure,

type Recognizer = String -> [string]

Throughout the remainder of this thesis we will also introduce the

definitions of parsers and rewgnizcrs thu are of slightly different types than

those given above. For example, instead of applying language processors to

the string of tokens yet to be p r o d (with the wumption that the fim

character ro be processed is the first character of the input string), we can

apply each of them to the whole input a ~ g and a single stuc position (the

position specifies the fim character to be processed). One advantage of thk

representation is that the input string remains undiuiged through the whole

process of parsing or recognition, and there is no need for each processor to

r e m an uncollsumed part of it. Instead, each paner or recognizer renvns a

aur position for the nmc processor. The output of processon is more

compact md therefore more suitable for storing (for example in a memo-

table). The corresponding modified types of parsers and recognizers are

given below.

type ParserZnt a = String -> In t -> [(a, In t ) ] type RecognizerInt = String - > Int -> [Int]

3 BNF notation and hnctional language processon

In BNF noration gram ma^ are constructed by defining a set of

terminals and a set of productions. The symbol E denotes an empty

production, the symbol 1 denores aluniauon, and juxtaposition denotes

sequencing. More cornplex productions cui be built from simpler ones by

combining rhun using altemation or sequencing.

Corresponding language processors can be consuucted by d e k g the

functiom empty and term that correspond to an empty production and a

terminal, and the higher-order functions orelse and then rhu correspond to

altemation and sequencing in BNF respectively. Similady as ci BNF, luger

parsen can be b d t from smaller componentj by combining them using the

alternation or sequencing (higherorder) operators. The structure of the

rd t ing panen closely resembles the structure of the underlying grammars.

3.1 Basic parsers

In this section we define three basic p v x n and recognizcn that can be

used as building bl& for more cornplex luigruge processon. The paner

emptyP v dways s u d without conniming any of the input string, and it

retums a value V. The pamr failP always fa&, regardless of its input. The

paner termP c processes a single duraccer at the of the input

string. It fails if the Gm character to be processed k not C, or if the input k

an empy string. Example definitions are given below.

type Parser a = String -> [(a, String)]

failP : : Parser a f a i l P inp = II

te- : : C h a r -> Parser Char te* c inp = case inp of

L I -> failP inp (x:xa) -> if x == c then [(c, xs)l

else failP inp

Figure 1 Basic parsers

Similady, one can define the corresponding recognizen. The

recognizer emptyR dways succeeds rrniming its input unchangeci. The

recognizer failR always fails. T h e recognizer termR c connimes a single

character c at the beginning of the input string or fails if the fim character is

not C, or if the input is an empry string.

type Recognizer = String -> [String]

emptyR :: Recognizer emptyR inp = [inpl

f a i l R : : Recognizer f a i l R inp = El

te- :: Char -> Recognizer termR c inp = case inp of

[1 -> failR inp (X:XS) -> if x == c then [xsl

else f a i l R inp

The following illustrates the use of the above &finitions.

It is not d i f f id t to modify the &ove definitions so that the language

processon built with them accept as parameters a single start position and

the wwhe inpu1 suing. However, rather than rewriting d the definitions,

let us consider the relarionship between the types of the tnro corresponding

procesors. A parser thar is applied to an unconsumed part of the input

string is of the type:

type Parser a = String -> [(a, String)].

The corresponding parser that is applied to a string and a single start position

is of the type:

type ParsexInt a = String -> Int -> [(a, 1nt)l.

We can gene& the type of the first parser by abstr;iccing over the

type of the strings (considering it as an additional parameter). The type of

the fine parser could be written as:

type Parser a b = b -> [(a, bll.

In a similar way we can parametrize the type of the second p a r by

absracting over the acnul representation of a 'position"

type ParserInt a b = String -> b -=r I (a, b) 1 .

We oui now write the definition of the type Parserh in terms of the type

Parser

type ParserInt a b = String -> Parser a b.

kter in this thesis we shall see the advantages of this approach. The

basic parsers and paner combinators of the type ParserInt no longer need to

be dehed explicitly (excep for the definition of terni which depends on the

actual representation of the input). They vise as special instances of K i n g

operations of rhe type Paner to the operations of the type ParserInt. The

same applies ro the definirions of CO rresponding recognizen.

3.2 Combinators

Basic parsers cm be combined uing the operaton orelse and then to

form more complex parsen. The operator then corresponds ro sequencing

in BNF. It applies the second parser to the result retumed by the finr one.

The operator orelse corresponds to alternation in BNF. It applia two

parsen to the same input and concatenates their results. Example dehnitions

of these operators for the recognizea are given below. The notation p

'orelse' q is equivdent to orelse p q; single quotes are used in Gofer ro

denote an infix operator.

1

thenR :: Recognizer - > Recognizer -> Recogriizer (p 'thenR' q) inp

1 r p / = 11 = q ( r p ! ! O ) 1 otherwise P 1) where rp = p inp

orelseR :: Recognizer -> Recognizer -> Recognizer (p 'oxelseR' q) inp = p inp ++ q inp *

The recognizes (p 'thenR' q) fa& if the recognizer p fa&. Ocherwise,

the recognizer q is applied ro the finr element of the lia renirned by p. The

expression (rp!!O) denotes the ffim element of the lisc rp (the element at index

0); the symbol '/=' is uxd in Gofer to represent the %or e q d " operator.

Using the definitions above we can consuuct recognizen with structures

closely resembling the nmctwes of the underlying grammars.

a-then-b :: Recognizer a-then-b = t e e 'a1 ' t h e ' te- 'b'

a-or-b :: Recognizer a-or-b = te- 'a1 'orelseR' termR 'b9

? a-then-b *abcu ['Ic" 1 ? a-or-b llabcw [nbcml

4 Nondeterministic Ianguage processors

Representing languige proasson as funcrions that retum a list of

r d t s has one advantage: it is relatively easy to rnodify the processors so

rhat they can return more than one result. One approach to modifying rhe

definitions of the basic proassors and combinaton so that they can be used

to build nonderefministic parsers is to fim rn- their types, so that they

accept a lis of inputs as parameter and renirn the he of outputs as their

results. For example the type of the recognizen could be defined as

type RecognizerAmb = [String] - > [String].

Our fint implementation of m e m o d recognizers in the purely

h c t i o n d 1anguage Miranda1 v 9 0 ] is based on this approach FS96] (a

copy is given in Appendix A). This approach has m a i n disadvantages: 1)

most of the def~t ions of basic parsers and cornbinaton must be modified, 2)

the new definitions are often more cornplex and difficult to understand &an

the definitions of corresponding nondetefministic processors

A better approach is to am with recognizers that are of the type

String -> String (they either s u d or remm an internai error) and thLik

about unbiguous recognizers as functions that involve an additional effecc of

nondeterminism. The nondetenninistic recognizers are of the type Stnng

[String] where the resulting l i s can have any number of elements (an

empry Iist denotes failure). The definitions of the initial recognizers are

almost identical as thox presenred earlier in this chapter; the definitions of

the corresponding nondeteminktic recognters arise automatically as

special instances of lifting operations dut renim a single r d into

operations that retum a lis of results.

Nondetemiininic language processors cm use backtracking and

renvn multiple results. This ability, however, cornes at a price. If an

underlying grammar is ambiguous, the corresponding functional processor

may have exponential space and Erne complexity. The main reason of su&

complexity is the repetition of the svne computations during backtracking.

Mernokation is a dynamic prognmming rnethod which allows one to

avoid performing the same computation more rhan once. A memoized

funaional language procewr is a function that takes an additionai parameter

- a memo-table containing all previously computed results. If the input has

been processed before, the processor simply r m m s the corresponding result

from the memo-table. If the input has not been pro& yet, the new r d t

is calculated and then the rnemo-table is updated.

One approach to implementing memoized recogn;Zers is to slightly

modify the definitions of basic recognizers and combinaton so that the

processon bu& wirh them accept a memo-table as part of their input and

renvn a m e m ~ a b l e as part of th& output. Na, the higher order funnion

memoize is applied to each recognizer to store its r d t in the memo-table.

This approach is described in detail in Appendix A.

Section 6 of Appendix A presents a slighdy different approach ro

implementing memoization. If we consider our initial recognizers to be

funcrions diat retum a List of vdues, then memoized recognizen cui be

represented as funccion that, when appiied to a memcxable, renvn a lisc of

values paired with the modified memcdde. One advantage of this approach

is that there is no need to define memoized versions of basic recognizen and

combinators. They arise automatically as special instances of Iifting

cornpurations of the type [a] inro computations of the type State -> ([a],

State).

We have already niggested earlier that exploring the relationship

between the types of two programs may help us to avoid unnecessary

rewriting of one program imo the other. Modifymg a program by rewriting

its components is time conniming. Furthemore, the correctness of the

initial progam does not no guvantee the correaness of its modifieci version.

The monadic approach allows one to transfomi one program into the other

in nifh a way that cenain properties of the initial program are preserved

The basic ideas of this approadi corne from Category Theory and were

introduced to Computing Saence by Eugenio Moggi w089, M090].

6 Monads

The notion of monads cornes from Category Theory w1, BA90,

PI911. Informally speakmg, a monad over a category is an absrract suucnire

that d o w s one to reason about the objects of the category in tenns of 'how

rhese objeas interrelate" W I ] . In Computing Science a category of

interest is a category C of types and programs, and the relationship in muld,

is a function on types in C w089, M0901.

In order to define this function, one should distinguish the type of

values a program produces from the type of the prognm i d . For example,

a. pro- that is "effect freen does nodiing but remm a value of some type

a. Therefore,

the explicitly

cornpuration.

the type of such a program is always identical to the type of

retltrned value. Consider a non4eterminist.i~ (or ambiguous)

This computation returns a set of possible resuits. Such a set

could be represented, for example, as a lis (that is the result could be of the

type [aD. Using the same principles, a c o m p u ~ o n that handles exceptions

could be of (ui algebnic) type Raise String 1 Return a. In each of these

cases the type of the computation cm be defined as a funaion of the type of

the value the computation produces.

A monad in a category of types and programs is a triple: a function on 4

types T (describecl above) that defines the type of program, cogether with

w o operations (that can be interpreted as composition and identity) that

d o w the combination of nich progams. The type consruaor T abstracts

over the 'eff«tm the program incorporates. The type of the two operations is

defined in terms of the type consrnuxor T. Having defined a monad, one c m

write a program as a set of components of the &tract type T a and use the

taro opentions ro combine them. In other words, the structure of the

resulting program does not depend on the 'effed the prognm incorporates.

7 Monads and functional p r o g r d g

Monads abstrace over the kind of an "effect" that is added to a

progrun. This idea inspired Wadler WASO] to introdua monads as a tool

for amcturing purely funaionai prognms. Programs w h e n in purely

functional programming languages such as Haskell [PT96], Gofer [JN94], or

Miranda [rvsO] are somehes very dif&cult to moddy if one wants to add

the 'effectsw such as state, interactive VO, or just to p ~ t some error

messages. Wadler noticed that monads can be used to easily incorponte Ndi

effects.

One advantage of the 'monadicW approach is that the "effeasW are not

SrisibleW in most of the function defînitions of the program. The kind of the

effms the program includes a n be derermined by examining the definition

of the monad. In order to add a new "effecr" one simply h u to change the

definition of the rnonad and make some additional, unidy trivial, locai

changes.

8 Monadic construcîion of language processors

It has been noteci by Wadler mA90, WA92] that the monadic

approach allows one to build language processon that are more modular and

easier ro rn+ chan traditional ones. A parser can be thought of as a

program that deah with interactive input. When applied to the input string

it returns a pair: value and the unconnimed part of the input. A non-

detenninirric parser can be thought of as a progam that incorporates two

'effects*: interactive input and non-detenninism. Sknilarly, a deterministic

paner is a program that combines interactive input and the ability to fail

(that is, either it returm a value V, or it fa& producing no value - this can be

capnueci by the type commaor: Ok v 1 Fail).

Defining a monad that represents a single 'effed is not difficult. It is

&O usually possible (using an ad-hoc approach) to define a "combinedm

monad that reprrsents a composition of feanues. However, *how to

combine arbitrary monads?" wu a long-standing question and a topic of

research in the uea of s d e d 'monadic funaional programmingn.

Attempts at hding a genenl tedinique for composing two arbitrary

m o d , were made by King and Wadler [KN92], Cenciarelli and Moggi

(CE931, Jones and Duponcheel IJNB931, Steele [ST94], and Espinosa D95J

and endcd with partial successes yielding techniques that were not general.

More recently, Liang, Hudak, and Jones have proposed a new method, based

on the theory of monad transfomen w95], that allows one to compose

monads in a fully rnodular way. One application of the technique was the

constmction of modular progrvnming language interpreters.

Using the above technique we have impIemented different types of

language processon. Our processors are f d y modular, they are built up

f'rom components that represent various prognmming language features,

such as nondeterminism, exceptions (used to represent determinism),

interactive input (parsers), and state (memoiution). This approach dowed

us to eady extend the same technique we have initially used to memoize

functional recognizers, to improve complexity of other language procwon

(such as, for example, syntaxdkected evaluaton). The details of this

implementation can be found in Chapter 5.

9 Organization of this thesis

The remainder of this thesis is orpueci as follows.

Chapter 2 gives a brief introduction to Category Theory. We

introcluce basic definitions and explain category-theoretic nouons of monab

and Kleisli triples.

Chapter 3 dexnbes how Category Theory mon& and Kleisli triples

are represenwl in purely functiod prognmming. We start this presentation

with a short description of the category-theoretic semantics of computations

proposed by Eugenio Moggi.

Chapter 4 presents how purely funaional recognizers can be 4

connnscted using the monadic approadi. We begin by discussing monadic

recognizen that incorporate a single "effect". Nexr, we describe how monads

can be combined to yield recognizer~ that involve a combination of different

Chapter 5 describes the d d of implementauon of memoized

language processon using type comtructor classes in Gofer. We stvt by

discussing the implementation of memoizcd recognizen. Next, we show

how easily the technique for memouing functiod recopizrs can be

extended to more complex language promsors.

Chapter 6 conchdes summarizing the main advamages of using the

monadic approach to constnia purely functional language procesors.

Appendix A contains a copy of the paper nimmuiùng our early

efforts to implement mernohion in the purely bctiond progrvnming

Miranda. This paper indudes a M e d description of the memoization

algorithm together with its forrnal complexity anaiysis.

Appendix B contains the Gofer soum code that implements

memoivd monadic language processon using type consvuaor classes.

C h a p t e r 2

INTRODUCTION TO CATEGORY THEORY AND MONADS

1 Introduction

In Category Theory mathematical concepts are representd using

abstract diagram. Such diagnms consist of verrices representing objects in a

caregory, and directed edges (arrows) representing the mappings berween

these objecu. A diagram is called commutative if for each pair of vercices X

and Y, any cwo parhs fozmed from directed edges leading from X to Y yield,

by composition of the corresponding mappings, equd mappings from X to

Y. A diagram

can be used to represent the category of sets and functions. If f, g, and h are

functions such that f : X + Y, g : X -t 2, h : Z + Y, then the above

diagram is commutative if f = h O g, where O denotes usual composition of

functions. The same diagram may be wd in many other contexts where X,

Y,-and Z represent objects, f, g, and h represent mappings bemeen them, and

the operation O defines how two mappings can be composed.

Category Theory offen an abstract view of mathematical concepts.

The concepts are abstracted from the context in which they were made

precise, and therefore they can be insranriateci into other contmrs that were

not considered before. This section gives a bief overview of basic concepts

in Category Theory and presents a category-theoretic introduction to

monads. The presentation here is baseci on the basic texrbooks on Category

b r y w1, BA90, PI911.

A utegory is a collection of objects, a collection of arrows (&O called

morphisms), together with taro opeations:

identiry, that assigns ro each object A an arrow IdA (the arrowpointing

fiom the object A to itself),

composition, that assigns to each pair of arrows f : A + B, g : 8 -+ C an

vrow g O f : A + C called their composite.

The composition of morphisms musc obey the associative law, that is for any

morphisms

the condition

h O (g O f) = (h O g) O f musr be satisfied.

Another requiremenr of a caregory is that the Id funaion is an identity for

the composition. That k, for any morphism f : A + B, the following #

condition must be satisfied

The categories of h t e r ~ from the functional prognmming point of

view are those when objects are types and morphisrns are prognms.

Identity is an identity program. Composition is the way of combining rwo

P'OgramS-

3 Functors

C . - C

AfUncîor IS a rnorphism ot ategones. Given m o categories C and D, a

functor F : C -t D is a pair of functions:

4 the objecr funaion that maps each objea A of the category C into the

corresponding objea F(A) of the category D,

4 the arrow function that w i p to each arrow f in C the corresponding

MOW F(9 in D.

Each funaor is required to preserve the identity and the structure of the

composition of morphisms, that is for any raro morphisms f and g and the

i&ntity morphism Id, in C, the folIowing conditions mus be sausfieci:

The gaphical reprexntation of a functor k given on Figure 5

In category C

F i e 5 Functor F mapping category C into D

An mdofrcnctoor is a h a o r from a category to itself. It maps objecu of

a category C, and mappings between hem, into corresponding objects and

0 a type constructor on l.ists that maps a type a into the corresponding

rype [al

type List a = [a],

a standard library funcrion rnap that cm be thought of as a mapping of a

program from a to b into a program from [a] to [b]

W P . . . . (a - > b) - > [al -> [bl

map f [I = [1

map f (x:xs) = f X : map f xs.

4 Naturai transformations

Given two functon F, G : C + D that are the mappings berween the

svne categorks, a nuturd tran$ontzcttim y from F to G is a mapping that

as+ to each objm c of C an arrow yc : F c + G C. In pictorial

representation the nanial transformation y can be thought of ~ I S a way of

"slidingn the diagram defining the hinaor F onto the diagram that &&es

the functor G, such thar all pdelograms ( k e those shown on Figure 6) are

commutative.

In category C In category D

Figure 6 Naturai transformation q i n g funaor F into funaor G

Naturd transformations are families of arrows. If F and G are two

in C) we can think of a natucd tansformation as of a polporphic funaion

of the& F a -> G a [ES95]. It is noc difficult to find examples of naturai

transformations in purely funaional prognmming laquages. For instance a

polymorphic h a i o n list that takes as argument an element of uiy type a

and r e m a singleton list of the same type is a natural transformation.

type F a = a

type G a = [al

list :: F a -> G a

list x = [XI

5 Functor categories

Nanid transformations can be c o m p o d The composition of two

n a d transformations is a natural transformation. It is &O associative and

for each functor F there exists an idenuty nanual transformation 1 : F -t F

(a mapping of a funaor into itself). Therefore, given the ~ategones C and D,

we can formally consuuct afuttctur categv 3 that hu functors F : C + D

as its objects and n a d transformations between such functors as its

morphisrns (see (MMI], pA90], or PI9 11 for dexads of this connniction).

Owtig to the fact that it is very convenient to abstract over the objecrs

of a category and to reyon about the caregory only in terms of hinctors and

nanual transformations, functor categories are extensively used in Category

Theory. An example of a h a o r category is a monad - a h a o r category

with one object. Mon& are deKnbed in the next section.

6 Monads

In Caregory Theory, a r n o d over a category C is a triple (T, q, p),

where T : C + C is an endohuictor (a functor mapping to and from the

same category), and q and p are two naturd transformations defmed as

follows

For the triple (T, q, p) to be a m o d , the three laws called the arsocirtive

macul &w, and the leji and nght idmtiq laws musr hold:

P ~ V * P ) = P ~ ( P ~ ~ - associative law

CI (q O T) n Id3 = p O (TO q) - leftand right unl law.

These laws (if satisfied) guarante that the triple forms a functor category

over the category C.

7 Bleisli triples

K k I i ~~ are alternative d escriprions of mon .ads and there is a one-

to-one correspondence h e e n the two ( s e W I ] for the proof). A Kleisli

triple over a category C is a triple (Tl q. -'), where

T : Obj(C ) + Obj(C ) (T is a function on objeds, not a functor),

q, : A + TA for A E Obj(C ),

fq :TA+TB for f:A+TB,

and the following conditions hold:

O f= ldTAOf

f W 0 q * = f

g œ O ( f œ O h ) = ( g O f ' ) ' " h

(right unit),

(Ieft unit),

(associativity) .

Given a monad (T, q, p) one cui consvuct the correspondhg Kleisli

triple (T, q, -') by restricting the endofunctor T to objects. Convenely,

given a Kleisli triple the corresponding monad can be constructeci by

extending the funaion T to an endohor.

Monads are more widely wd in Category Theory than Kleisli triples.

They have the advantage of being defined only in terms of functors and

natural transformations, which rnakes them more suitable for abstracc

manipulation. Kleisli triples, accordkg to Mo& w 8 9 ] , are easier to

jusufy from a computational perspective.

8 Kleisli Categories

Given a Kleisli triple (T, q, -') over a category C the corresponding

Kleisli category Ç can be defineci as follows:

the objects of Ç are the same as those of C,

if f : A + B is a morphLm in C then f : A + T B is the wrresponding

the composition of two morphisms f : A + T 8 and g : 8 + T C in Cr

is defined as g ' O f,

this composition must be assoüarive with q as its lefi and nght unit.

If an underlying category C is a category of types and prognms, then

given a Kleisli triple (T. q, -.) we can formdy construct a Kleisli category

Ç of types and programs over the category C. The objects in Ç are types

(as in C); the morphisms in Ç are programs from the type a to the type T b,

where the endofundor T is a function on types in C. The expression g ' O f

represents the composition of the two programs: f and g. Mo& caregorical

semantics of compurations, which is d i d in the next chapter, is baseci

on this idea.

C h a p t e r 3

COMPUTATIONS AND MONADS

1 Introduction

Kleisli cwgories were originaliy proposed by Eugenio Moggi as a

conveniem fnmework for struauring the semanun of prognmming

languages [M089, M0901. The prinaple underlying Moggi's work on

monab was the distinction between simple data-valued functions and

h a i o n s that perforrn computations. A data valued hinaion is one that

simply renirns its value (and does norhing else). By conuut, a funaion that

perfomis a computation can encornpas ideas such as exceptions, sute, or

no~determinlrn~ and as a consequence, it cm impliudy produce more

results than the result explicidy retumed.

Wadler WA90] noticed tha Moggi's idcv of using monads to

stniaurr the semantics of computations f i d well into the purely

funaional prognmming environment and proposed mon& as a technique

for strufniring funaional programs. He sbowed that mon& can be used to

express 'imperative features" like updateable state, exceptions, non-

determinism, or VO in pure functiod languages, wMe rer?ining the strong

reasoning principles vaiid for these languages.

This chapter presents a brief overview of Moggi's categorical semantics

of computations based on mon&. It &O dexribes how category-theoretic

monads are represented in purely functionai prognmming Lnguages. The

presentation here owes much to the papers of Moggi w089, M0901,

Wadler wA90, WA92], and Hill and Clarke m94].

2 Categoricd semantics of computations

The basic idea behind Moggi's categorical semantics of cornpurvions is

that, in order to interpret a programming language in a category C, one has

to distinguish the object A of values (of the rype A) from the object TA of

computations (computations that produce a value of the cype A). If T is an

unary operation on objects in C that maps objects of values into

corresponding objects of computauons, then a program from A to B an be

identifieci with a morphism from A (the set of values of the type A) to TB

(the set of computations that produce a vaiue of the type B) in C. In other

words, a program is a function from values to computations.

In category-rheoretk terms T is an object mapping part of an

endofunctor in C; in the context of functiond prognmming, T is a type

constmctor (a function on types). Moggi calls an operator T 'a notion of

computation", since it ab- away from the aaull type of values

computations may produce. Examples of notions of computations that are of

particular interest to funaional programming are as follows:

computations with side effects that denote a mapping from a state to a

pair: value and die modified state

type T a = State -> (a, State), d

0 non-determinhic computations that denote the set of ail possible values

type T a = [al

computations with exceptions that denote either a value or an exception

data T a = Raise String 1 Return a,

interactive input that denotes a function from the input string of tokens

to a pair: the Gm token and the rest of the input

type T a = [al - > (a, [al),

interactive output that denotes a pair: a value and a funaion that maps a

string (the output of the rest of the prognm) into a string (the output of

the whole

type T a = (a, [al -> [al).

Rather than focllsing on a speufic notion of cornputarion, Moggi

proposed Kleisli triples for modehg the notions of cornputauons and Kleisli

cvegories for modeling categories of programs. The components of the

Kleisli triple (T, q, -O) a n be interpreted as follows. The endofimaor T is a

function on types that maps the type of values into the type of

corresponding computations. The n d transformation q appliad to a

value renirns a computation producing this value. The expression g ' O f

where f : A + TB and g : B + TC has the following meaning: fim apply f

to some value of the type A to produce a computlrion of the type T 6, then

evaluate chis computation to obtain a value of the rype B, finally apply g to

this value and renirn a computation of the type T C as a r d . This

expression corresponds to sequencing of m u computations. The expression

q a may be i n t e r p r d as a 'pure" (i.e., effect-kee) computation t h does

nothing but deliven a value. The composition (g ' O f) a represents a

computation that indudes all of the =effectrrn of f followed by applying g ro

the value computed by f.

In a similar vein one can intex-pret monads. Given a Kleisli triple (T, 11,

- ') over a category C the corresponding monad is (T, q, p), where T is a

functor (that is T is a pair of mappings: an object mapping and a morphism

mapping). The n a d transformation q has the sune m&g as for KleisLi

triples; the natunl transformation (which is of the type T (T a) + T a)

an be thought of as a way of 'flattenia% a computation of computations

into a single computation.

By using monads, Moggi dehed the semantics of computations with

'effens" which is independent of the kind of the effect these cornpudons

incorporate. Each effect is simply an instance of the same 'notion of

computation". Note that also 'no effectm (or a pure computation) is nich an

instance. B a d on the categorid semantics of computations Moggi built a

system cded computational Lulculus, that cm be used for proving

equivaience of prognms. The andysis of the systern is beyond the %ope of

this thesis. The detailed description of computational I d c u l u s can be found

in Moggi's papen w089, M0901.

3 Category Theory rnonads in functional programrning

ki funnional prognmmlig, monads are usually presented as a kind of

an abstract dara type. The type definition indudes the definition of the

monad i d and definitions of primitive operations related to the parti&

effect the monad represents (set [WA92] and many other papers). For

example if the e f f m in mind is state, the primitive operations may indude:

new (that creates a new structure representing the state), l00kup (rhat

searches the srare), and update (that updates it). These openrions have

dearly nothing in cornmon with Category Theory and they are in many

cases application dependent. The funaionai definitions of the triples,

however, closely resembles their correspondhg cwgory-theoretic

definitions m94]. This section explores the relationship b e n the two.

Given a Category Theory monad (T, q, p), the functional

programming monad is represented by a quadruple (M. map, unit. join)

PA90], where:

0 M is a type constructor, for example

type M a = Eal,

map is a higher order funaion (dogous to standard map on lists):

map :: (a -> b) - z M a - > M b,

unit represents the nanial transformarion q

unit :: a -> M a,

join represents the n a m d transformation p

join :: M (M a) -> M a.

If a is a type of values then M a represents the type of progrvns that

renirn values of the type a. A pair M and map mo&k a functor: the type

corutmaor M is a mapping on obj- (types), the higherorder function

rnap is a mapping on arrows @rograms). The definitions of the functions

unit and join depend o n the effect the monad represents. The function unit

converts values to corresponding computations. The funcrion join 'flattensn

a computation of computations into a single computation. A good example

of join is the standard librvy h a i o n concat that 'flaftens" a lin of lists

into a single lisu (see example below). The quadruple musr satisfy the laws

eqyivalent to the monad laws given in the previous chapter.

join unl = id = join O map unl (left and right unit) join O map join = join O join (associativity )

Figure 7 Monad iaws

Lists represent nondererministic computations (computations chat

return a set of possible results). The monad for lists mA92] is given below.

The type connntcror M dehes the type of computations. The funaion unit

creares a one element lis. The funaion join &es a list of lias and

concatenates all sublists into a single kit. The function map is a standard

map on lists.

type M a = [al

u n i t :: a - > M a wit = \x -> [XI

j o b :: M (M a) -> M a join = concat

map :: (a -> b) -> M a -> M b map f = \x -> case x of

II -> [1 (x:xs) -> f X : map f xs

An expression of the form \x -> e is d e d a lambdaexpression, and

denotes a hinaion that &es an argument x and returns the value of the

expression e. Therefore, the function unit could equally well be defined as:

unit x = [XI. The definition given in Figure 8, however, is more expressive

and corresponds more closely to the type of u n l

3.2 gleisli triples in functional programs

In Wadler's more recent papers, the use of monads bears closer

resemblance to Kleisli triples. Given a Category Theory Kleisli triple (Tl ql

-*), the correspondhg "hinaional programming triple" is represented by a

triple (M, unit, bind), where:

0 M and unit have the sarne meaning as for monads from the previous

section,

bind is a polymorphic hinction nich that the expression (f 'bind* g)

corresponds to (g ' O f).

The type of the function bind is: bind :: M a -> (a -> M b) - > M b;

the meaning of the expression (f 'bind* g) is: first apply the funclion f to

produce a cornpudon of the type M a, evaluate this cornputarion, apply the

funaion g to the r d t , and retum a computatîon of the type M b. The

hinaion bind is simply a composition in a aeisli category of computations.

The triple musc obey the thrce laws given below, which are equivalent tu the

laws given in the previous chapter.

unl a 'bind' \b + = n [a I b] - leff unit rn 'bind* ib + unit b = m - right une m 'bind' (ia + n 'bind' \b + m) = (m 'bind* h + n) 'bind* \b -+ m

- associativity

where n [a 1 b] denotes n with a substitut& for b

F i 9 ~ l a v i p l e h m

The Klcisli triple for lisrs, equivaient to the rnonad defined eulier, is given

on Figure 8 @üA92].

type M a = [al

mit :: a - > M a unit = \x -> [XI

bind :: M a -> (a -> M b) -> M b x 'bind' y = case x of

II -> 11 (a:x) -> ( y a) ++ ( X 'bind' y )

Figure 10 gleisli triple for Iisu

The functiondity of the function bind in the example above is

araightforward: if the £irst computation retums a list of possible r d t s then

the second computation musr be applied CO each elemenr of this lis; the

resuits of each application should be cornbineci, so that the fmal r d t is a

single lïst. The operator '++" in the example above denores list

conatenation. This operator could be replacecl with any other associative

operator that combines two lins (e.g., merge). The functions map and join

from the previous paragraph can be defined in r e m of unit and bind as

fouows @UA92]:

map f x = x 'bind' \a -> unit (f a)

join x 3 x 'bind' \a -> a.

Moggi [MO891 proposed Kleisli triples as a representation of

computations with 'effecu". His daim wu that Kleisli triples were more

convenient for expressing computations than monads. The svne seems to be

tme in funaional prognmming. The functionality of the bind opentor as a

composition of two prognms is intuitive and easy to understand. By

contrast, the functions map and join are d e r difficult to j - 5 from

computational point of view. Ln the remainder of this chesis we will use the

formulation of monads as Kleisli triples.

C h a p t e r 4

MONADIC CONSTRUCTION OF PURELY FUNCTXONAL RECOGNIZERS

1 Introduction

Monads are a powerfd tool in functiond progamming. If a program

is wrinen using a monad to pass around a variable ( N e the aate or

exception) then it is easy to change what is p d vound simply by

changlig the monad. Only the parts of the pro- that deal directly with

the quantity concenid need to be altered, parts which merely pass it on will

stay the same.

This chapter describes how monads an be used to construct functional

recognVrn. We srart by defining the type of the monadic recognizers. Next

we give the definitions of basic monads and disaiss how these mords can be

d to build procnsors thu incorponte a single 'effecr". The remainder of

this chapter addresses the problern of combinhg mon&. Different "effects"

an be combined by using parametrizÊd monads &I95] to yield the

recopizrs that involve a combin?tion of different features.

2 The type of the rnonadic recognïzers

In the introduction of this thesis we have def'rned the recognizers as

functions thiit applied to some input connune as much of it as possible and

retum the unconnimed part of the input for furdier processing. If the input

is represented as a string of tokens to be processed, the type of the

recognken can be written as

type Recognizer = String - > String.

Suppose that we want to have recognizers that retum exîaly one

result or fail otherwise. This can be açhieved by modifying the type above as

follows

type RecognizerEx = String - > Ex String

where the type c o r n c t o r Ex is defined as

data Ex a = Ok a 1 Fail.

(That is the r d t of the form Ok v reprwnts the suurssful recognition of

the input with the single value Y retumed; Fail represenu failure.)

Similady, the recognizcrs that retum a list of possible r d t s (where an

empty list denotes Mure) can be represented using the type cornruaor

type RecognizerList = String -> [String] .

The moaadic approach allows one to abstract over the 'effecr" the

recognizers incorpo-. Suppose that M t a 'monadic" type conmaor

that represents some feature. By defining the type of the processors in te-

of the type conswctor M, we cui make the type of the recognizers

independent of the effect they incorponte. Owing to the fact that the

representation the input strings tokens application dependent, we can

make ir into a parameter to the type Recognizer.

type Recognizer a = a - > M a

Figure 11 The type of rnonadic rceognkn

The type definition on Figure 11 should be interprend as follows: a

recognizer is a function that applied to a value of some type a r m

( i i e a d of retuming a result of the type a) a computation that produces a

result of the type a. Such a computation may encompass ideas nich as state,

arceptions, or nondeterminism. Later in this chapter we will discuss the

advamages of d e k g the type of recognizcn this way.

3 Basic mcinads

3.1 The identity monad

The identity monad represents computations as the values they deliver

(i.e. ueffect-free" computations). It is the starting point to which other

capabilities can be ad& The monad is represented by a triple: the type

constmctor M thac can be thought of as an identity h c t i o n on types, the

function unit (whidi is an identity function), and the funaion bind that

applies the function k to the value produced by the computation X.

type Id a = a

unit :: a -> Id a unit t \a -> a

bind :: Id a -> (a -> Id b) -> Id b x 'bind' k = k x

Figure 12 The idcntiry m o d

Using the operaton unit and bind of the identity monad we an define

the "identityn Ceffect-free") recognizers - the recognizen that either succeed

r d g a single result or, if som&g goes wrong, they simply produce

an internai error. The "monadic" definitions of the recognizer empty and of

the sequenhg operator then are given below.

type Recognizer a = a -> Id a

e m p t y R : : Recognizer a emptyR = unit

thenR :: Recognizer a -> Recognizer a -> Recognizer a (p -then' q) inp = p inp 'bind' \xl ->

q xi 'bind' \x2 -> unit x2

The definiron of the recognker (p 'then' q) an be interpred as

follows. First the recognizer p is applied to the initial input and the value

retwneci by p is bound to the variable xl . Next the rrcognizer is applied

to the input x i and its value is bound to the variable ~2 Finally the value ~2

is convened into the comsponding computation and thk cornpudon is

retumed as a result.

The identity monad does nor provide us with the notion of failure.

This notion is required to give meaningful definitions of the recognUcrs fail

and têm. In addition, in order to define the operator orelse we need to

speclfy the notion of 'choice". The monad definecl in the next subxcùon

provides us with both of these nouons.

3.2 Exceptions

Exceptions in purely functiod prognmming laquages were midieci

by Spivey who, independently of Moggi, noticed that monads are a useful

tool for representing exceptions in funaional prognms [SPSO]. We can think

of a value of the type Ex a (dehed on Figure 13) as a cornpuration that

either nicceeds with a single value of the type a, or it fails produchg no

value. Therefore exceptions correspond to dererminisllc choice.

In addition to openton unit and bind, it rnakes sense to d e h e two

additional operations on the values of the type EX a [LI95]. The opention

PIUS defines a composition (in the sense of genenlized addition) of two

proof the type EX a. PIUS for exceptions cui be interpreted as a

(detexminisic) dioice operator that retums the fint computation if it

suaeeds, and the second othemise. The opention zero is an identity for

PIUS and it represents a computation that always füL. The operations PIUS

and zero are not part of the exception monad. They simply provide a

different way of combining proof the type EX a.

We shail see Iater in this chapter &at the aapt ion monad is not the

ody monad for which it makes sense to define zero and plus. Another

example is a monad for Lists where PIUS coufd be defined as concatenation

(or merge) of raro lists with an identity (zero) an empty lis. The operators

bind (with identity unit) and PIUS (with identiicy zero) provide the meam to

structure prognms in modular wzy.

data E x a = Ok a 1 F a i l

unit :: a -> Ex a unit = \a -> Ok a

bind :: Ex a -> (a -> Ex b) -> Ex b F a i l 'bind' k = Fail (Return a) 'bind' k = k a

zero : : Ex a zero = Fail

plus :: Ex a -> Ex a -> Ex a Fail 'plus' x = x X 'plus' - = x

Figure 13 The exception m o d wi& zur, ;ind plus

We can now &fine the m o d c recognizers fail and terni as well as

the operator orelse. The definîtions of empty and then given in the

previous subsecrion do not have to be modified. We simply modify the type

of the recognizers and replace unl and bind of the identity monad with the

corresponding operators of the exception monad.

type Recognizer a = a -> E x a

orelseR :: Recognizer a - > Recognizer a - > Recognizer a (p 'orelseR' q) inp = p inp 'plus' q h p

f ailR : : Recognizer a failR = zero

te- :: Char -> Recognizer String termR c inp = case inp of

[ 1 - > fail (X:XS) - > if x == c

then unit xs else fail

Using the above definitions we can build (determiainic) recognizen,

for example:

s :: Recognizer String s = (te- \a' 'th-' s 'th-' s) 'orelseR' emptyR

Although the underlying grarnmv is ambiguous, the recognizer s defineci

above returns only one result. The empty string rmrned as the r d t means

that the whole input string has been Nccessfully rea@ as an S.

We WU use the lin monad to tnnsform Our deterministic recognUcrs

into the recogn;Zers that retum a set of possible r d t s . The &finition of the

monad for lisu given below is almost identid to that given in Figure 10.

The only difference is th* the operator ++ (ht concxtenation) has been

replaad with the operator merge-re~ which merges two lists sorteci in

ascending order with duplicates removed. The same operator merge-res is

also a Nirable plus for the monad.

type List a = [al

u n i t : : a - > la1 unit = \x - > Ixl

bind :: [a] -> (a -> Cbl) -> [bl x 'bind' y = case x of

C 1 - > [1 (a:x) -> (y a) 'rnerge-res' ( x 'bind' y )

zero : : [al zero = 11

plus : : Cal -> [al -> Cal plus = merge-res

Figure 14 The lin m o d with zero and plus

The deterrninistic recognizers can now be transformeci inro non-

determinisric ones by modifying their type

type Recognizer a = a -> [a],

and by replacing unit, bind, zero, and plus of the exception monad with the

corresponding openton of the list m o n d If these changes are made the

recognizer s fiom the prcvious subsection behaves as follows.

4 Combining monads

4.1 Construction of a combined monad

So f u in this chapter we have dixusxd recognizen that rake as an

argument a string of chvacvn ro be processed. Suppose that we decide to

modify them so that thcy accept ~o parameters: the whole input string and

a single position that &es which diuaaer should be processeci fkt. As

we have shown in chapter 1, the type of such processon an be arpressed in

te- of the type Recognizer as follows

type RecognizerNew a = String -> Recognizer a.

The additional effm which r e c o e r s of the type RewgnizerNew

incorporate can be represented using the state-reader monad [WA90, WA92].

Owing to the fact that we would like to add this effeu on top of the effecu

rhat are Aeady in place (exeptions or lins), we will use the parametrized

aate reader monad m95] which is shown on Figure 15.

Assume diar we already have a monad (M. unit, bind). A monad

pararnevized over M can be constructed by definhg a new bind operation,

in terms of the old bind, and by defining a function lift,

lift :: M a -> MNew a

that lifts operations of the type M a in into opentions of the new type

MNew a. If the undu1yh.g monad has zero and plus chen the

43

corresponding opentors zero and PIUS for the new mond can be dehed in

r e m of the old ones. The main advantage of this technique is that the

definition of the operators unit, bind, zero, and PIUS of a combined monad

are independent of the choice of the base monad.

4.2 The parametrized state reader monad

The state rader monad abstracts over computations that rad from the

state but never updw it. Owing to the fact thar the 'post-staten is dways

assumed to be identical to the 'pre-stare", rhere is no need to renirn it. The

definition of the parametrized state reader monad is given on Figure 15.

type StRM m s a = s -> m a

1 i f t S t R . M :: m a -> StRM m s a 1iftStR.M x = \t -> x

unitStRM : : a -> StRM m s a unitStRM x = l i ftStRM (unit x)

bindÇtRM :: StRM m s a -> (a -> StRM m s b) -> StRM m s b (a 'bindStRMg k) t = a t 'bind' \va ->

k va t

zeroStRM :: StRM m s a zeroStRM = \a -> zero

plusStRM :: StRM m s a - > StRM m s a -> StRM m s a (X 'plusStRM- y) s = X s 'plus' y s

F i 15 The parametrizcd sate r& monad

The type constnicror StRM defines the type of computations that are

applied to some nate (of the type S) and r e m a computation of the type

m a. The operations unit and bind are deGned in r e m of the unit and bind

of the underlying monad. Although the stare reader monad does not have its

own PIUS and zero, these opentions can be defined in rems of the

comsponding operations of the base monad.

We can now define the type of our recogniw in temu of the type of

the ppanmevlled state reader monad.

type Recognizer a = a -> StRM m String a

That is our new recognizers are functions that applied to a value of the type

a (current position) and a state (an input string) rmirn a computauon of the

type rn a. The definitions of fail, empty, then, and orelse do not have to be

modiGed (except for the fact that the opemon unit, bind, zero, and pius are

now of the different ~ o I ~ J . The ody function that m u t be rnodified is the

recognizer term.

te- :: Char -> Int -> StRM m String Int termint c x s

1 (xci) ( 1 (x>length s) I I (s!! (x-1) /= c) = zero 1 otherwise = unit (x+l)

If the base monad of the parametrized state reader monad is the monad

for exceptions, the type of the recognizers is

type Recognizer a = a -> StRM Ex String a.

45

In other words, our new recognizen when applied ro î single stm

position and an input string of characters either r e m a vdue of the form

Ok v (that represents a single srart position for the next processor), or they

retuni the vdue Fail (that represents fulure). One advantage of defining the

new recognizers this way is that the previously given definitions of fail,

then, orelse, and ernpty do not have to be modifieci. We simply replace the

operaton unit, bind, plus, and zero with the corresponding operators of the

parametrized state reader monad.

The recognizer s defined earlier, whidi is now of the rype

Int - > StRM Ex String Int

behaves as follows.

The result Ok 6 returned by the recognizer means that the whole input

k g ha been s u d y recognrzed as an s (the position 6 of the input

string is the end of the string).

If the base monad of the parametrized state r d r monad is the monad

for lists, the type of the remgnizers is

type Recogaizer a = a -> StRM [l String a.

The recognizer s when applied to a start position and the whole input

string mums now a set of possible stan positions for the next processor.

5 Conduding remarks

The tedinique dexribed in this chapter allows one to construct

programs out of componenu that represent various prognmming language

fanues. The technique can be implementeci in any purely functional

programming language. However, in a programming language that does not

support overloaded operators: 1) different names mut be used for each

operator unit, bind, plus, and zero, 2) for each combination of featwes the

correspondhg operators mus be dehed expliudy. It is up ro a programmer

to determine which definitions of the operaton to use in a given contez.

The system of construccor dvses in Gofer IJNA931 allows one to

d e h e classes of types with overloaded operators. The main advantage of

k g consuuctor classes to implement %ombined* mon& is that there is

no need to use different names for the operaton such as bind and unit if they

are used in Werent contexts (that is, if they are pans of definitions of

Werent monads). The rype checker automatidy determines which

definition of bind or unit to use. The construction of language promson

using consuuctor classes is described in the n a chapter.

C h a p t e r S

MONADIC CONSTRUCTION OF MEMOIZED LANGUAGE PROCESSORS USING TYPE

CONSTRUCTOR CLASSES IN GOFER

1 The system of type construstor classes in Gofer

The system of type constructor dasses in Gofer allows one to d e h e

dasses of types with overloaded operators wA93]. Overloading enables the

definition and use of functions in which the mmeuiing of a funaion symbol

may depend on the types of its arguments.

Classes can be related in a class hierarchy. For example one class may

be defined as a 'subdass" of another (one of its "superclasses"), or it may be

composed of other classes. For each dass a set of suitable operations

(methods) can be defined. A subclw inherits a i l of the merhods of iu

supercluses.

2 Monads and type constmctor classes

Each monad is a triple (M. unit, bind) where the types of the nvo

operaton unit and bind are dehed in terms of the type constmctor M.

These types always have the same structure (no matter what feature the

monad represents). Therefore we can define a monad as a ctss pvvnetrized

over the type construcfor M, with two methods: unit and bind.

As we have shown in the previous chapter for some monads ir makes

xnse to defme m o additional operators: PIUS and zero. The structure of the

type of these two operaton is 'effect" independent. We cm define a dass

MonadPlusZero as a subclass of the class Monad that has two additional

methods: plus and zero.

class Monad m where unit :: a -> rn a bind :: rn a -> (a -> m b) -> m b

class Monad m => MonadPlusZero m where plus :: m a - > m a -> m a zero : : m a

There are rhree parts in any dus declaration. In the example above the

finr line ( d e d the header) of the declaration introduces the name Monad

for the dus and indicates rhas the clas has a single parameter, represented by

the type variable m. The second part (the signature part) is a lin of funaion

(method) dedaratiom. For each instance of the dw Monad we can define

rwo methods: unit and bind. The third part (not prisent in the example

above) may contain default definitions of the methods. For example Figure

17 shows the dw Recognizer with defadt definitions of the functions

emptyR, thenR, orelseii, and failR.

All the basic monads defined in the previous chapter can be now

represented as instances of the class Monad and MonadPlusZero (if

applicable). Iosraaces of a type clus in Gofer are defineci using declarations

similar to thox used to define the correspondhg type dass. For example the

foliowing dedarations specify that Monad Ex and MonadPlusZero Ex are

instances of classes Monad m and MonadPlusZero m respectively

instance Monad Ex where unit = unit& bind = bindEx

instance MonadPlusZero Ex where plus = plusEx zero = zero-

(where operations with the Nff jx 'Exn are those dehed previously for the

exception monad).

Similarly, parametrized monads can be represented as instances that

inherit (denoted using the symbol '=>") the monad operations from the base

instance Monad m => Monad (StRM rn s) where unit = unitStRM bind = binàStRM

instance MonadPlusZero m => MonadPlusZero (StRM rn s) where

zero 3 zeroStRM plus t plusStRM

3 The dass Recognizer

We have shown in the previous diapter that most of the basic

recognizen and combinaton can be defined in terms of the operaton unit,

bind, plus, and zero. Therefore it is convenient to define the dass of

recognizers as a nibdass of the clw MonadPlusZero.

te- :: Char - > a - > m a emptyR :: a -> rn a orelseR :: (a -> m a) - > (a - > m a) -> (a - > m a) thenR :: [a -> m a) - > (a - > m a) - > (a - > m a) failR :: m a

class MonadPlusZero m => Recognizer m a where

emptyR = unit (p 'the' q) inp = p inp 'bind' \xi ->

q x l 'bind' \x2 ->

unit x2 (p 'orelseR' q) inp = (p inp) 'plus' (q inp) failR = zero

Figure 17 The class Rccognizer

One advantage of defining the remgnizen this way is that once the

underlying monad m is defineci, the conesponding opentors empty, then,

orelse, and faii are defineci automaucally. The definition of term depends on

the represenration of the input (which is either a string, or a pair: an integer

number representing a start position and a string). Therefore we can define

m o instances of the clus Recognivr (where temilnt and temiChar are the

m o definitions of t e n that were given in the previous diaprer).

instance MonadPlusZero (StRM m string) => Recognizer (StRM m String) Int where

te- = termint

instance MonadPlusZero m => Recognizer m Str* where te- = termChar

We an now d e h e the recognizer S.

By simply changing the type definition of s we cm change its

behavior. For example if the recogrWer s is of the type

s : : Recognizer (StRM El String) Int => In t - > StRM [ J String Int

then it behaves as follows

s 1 "aaaaaW Il, 2 , 3 , 4 , 5 , 61.

If we change its type to

s : : Recognizer (StRM Ex String) I n t => fnt - > StRM E x String I n t

then the same recognizer applied co the same input returns value

Ok 6 ,

More examples can be found in Appendix B.

4 Memoized recognizers

Memoiution involves interaction with state. Statefd computations

can be represented in purely fimaional progams by using the suxe monad

[WA90, WA921. We stm by presenting the definition of this monad.

4.1 The state monad

The monad dehed below cm be used for adding nate operatioos to a

purely baional prognm. The type of progrvns th= intema with the sute

is defined as the type of a function that t?kg as its panmeter an initial sttte

and returns its value paired with the finai srate. The funaion unit &es a

value and a state and renims the same value paired wirh the initial nate. In

other words, the function unit a is "an identity" aate transformer. The

funaion bind combines rwo "stateful" computations. First, the computation

x is evaluated in the initial state t, next, the function k is applied to the value

returned by x and to the new aate t ~ .

type St s a = s -> (a, s)

d t S t :: a -> St s a unitst a = \t -> (a, t)

bindçt :: St s a -> (a -> St s b) - > St s b (a 'bindçt' k) t = k va ta

where (va, ta) = a t

instance Monad (St fi) where unit = unitSt bind = bindçt

Figure 18 The ss;;te monad

- - -

Owlngto thcfa tturthe p q o e of memoiuaorr isr&npreve-the

effiaency of processon that retwn a set of possible results, the memoized

recognizcrs involve a combination of rwo features: sure and non-

determinism. In order to combine these features we inuoduce the

parametrized monad for lists.

unitLsM : : Monad m => a -> L i s t M rn a unitLsM x = l i f t L s M (unit x)

bindLsM : : Monad m => ListM m a - > (a -> ListM m b) -> L i s t M rn b x 'bindLsM' k = x 'bbd' \xl - >

fol& p l u s L s M zeroLsM (map k xi)

liftLsM :: Monad m => m a -> ListM m a liftLsM x = x 'bind' \XI ->

u n i t Exil

instance Monad m => Monad (ListM ml where unit = unitLsM bind = bindLsM

instance Monad m => MonadPluszero ( L i s t M ml where zero = zeroLsM plus = plusLsM

zeroLsM : : Monad m => L i s t M m a zeroLsM = unit []

p l u s ~ s ~ :: Monad m => ListM m a -> L i s - m a - > L i s t M rn a (X .plusLsMI y) = x 'bind' \xl ->

y 'bind' \x2 -> unit (xl 'me-res' x2)

If the base monad for the parametrized List monad is the monad for

srare then the combination of the two monads yields the computations that

are of the type

ListM (St s) a = s -> ([al, S I .

In other words each cornputaion is applied to an initiai sate and returns a

List of values paired with the modifieci nate. (Note tbat combining the sune

monads in the reverse order yields cornpurations of different type).

Our memoization algorithm applies to the recognizen chat take as

parameters a single position and the whole input aring, therefore on top of

the two features: state and nondetenninism, they involve one more 'effm"

that can be represented using the parametrized state reader monad. We cui

define the type of memoized recognizen in r e m of the types of the three

monads as follows

type Recognizer a = Int - > StRM ( L i s t M (St s) ) String Int .

4.3 The memo-table

The introduction of the sate monad does not immediately improve

the performance of memoized recogniÿers. Operations that uress the state

are required to store and retrieve the previousiy computed results. We have

decided to represent rhe nate as a list of p i n . The first component of each

pair is an integer number that acts as an index to the memo-tabie, the second

component is P iist of pairs (recog nizer-name, recog nizet-value). OwLig

to the faa that different processon may retum values of different types, it is

convenient to parametrize the type of the memetable over the type of

values it stores.

type S t a t e v = Wnt , [(String, v)])]

The purpose of the function IookupSt is to return the value that

corresponds to a %art position and a recognizcr name (given as puameters).

The function updateSt given a start position, recognizer name, and the

vdue reninied by the recognizer, updates the corresponding entry of the

memo-table, The function newSt creates a new memo-table. The definitions

of these functions can be found in Appendix B.

The top Ievel function memoize is applied to each recognizer to store

its result in the memo table. The function is defineci in terms of the unit and

bind operaton of the state monad.

memoizeRec :: Recognizer (StRM (ListM (St (State [Int] 1 ) ) String) Int

=> String -> ( Int -> StRM (ListM (St (State [Int]))) String Int) -> (Int -> StRM (ListM (St (State [Int]))) String Int )

The definition of the function memoize conesponds dosely to the

rnemoiwion algorithm. In order to memok the recognizer f rhat is applied

to a start position i and the whole input string of tokens s we first xudi the

memo-table for the r d t that corresponds to the recognizer name and the

srarr position i . If the result w u computed before this r d t is returned.

Orherwise, the new renilt is computed and the memetable is upbred.

The rnernoized recognLer rns defineci as

ms : : Recognizer (StRM (ListM (St {State [Zntl 1 ) String) Int

%> fn t - I StRM (ListM (St (State [Int] 1 ) 1 String Int m s t mernoizeR ?nsm ((te- 'a1 'th-' ( m s 'thenR' ms) )

'orelseR' emptyR)

when applied to a string "aaaaa" at position 1 renirns the followiq result-

5 Parsers

5.1 The type of the monadic parsers

The monadic approd ailows one to evily extend the techniques

presented in the previous d o n s to more complex Ianguage processors

R d once agah the type of parsers:

type Parser a = s -> (a, s) -

The type of pvsen corresponds directly to the type of the m e monad

where the state is a SeQuence of tokens to be processed Such a monad is

sometimes callecl the input monad [WA90, WA92J. Similarly to the type of

monadic recognizea we can express the type of parsen in terrns of the

'monadic" type construaor m.

type Parser a = m a

5.2 The ciass Parser

Basic panen and combinaton can be defined in terms of the operators

unit, bind, plus, and zero. We can d e h e the dass of parsen as a subclars of

the c h MonadPlusZero.

class MonadPlusZero rn => P a r s e r m a w h e r e term :: C h a r -> m a empty :: a -> m a orelse :: m a -> m a -> m a fa i l : : m a

empty = unit orelse = plus

1

I fa i l = z e r o

Figure 20 The dvs P a r

(Note that we have not defined the operator then for panen. The

reason for that is that the monadic operator bind which integares the

sequencing of h a i o n s with the procesling of their values is much more

convenient to W.)

Having dehed the dus of panen we can build various parsen in a

modular way by combining appropriate monads. In addition ro the monads

defined so far, we wA use the parametrized input monad which is described

in the next subsection.

5.3 The parametrized input monad

The input monad is identical to the monad for state where die state is a

sequence of tokens. The type of programs is the type of functions from a

string (the input to the program) to a computation that involves a pair: value

and a string (the input to the rest of the program).

type ~ n p ~ m s a = s -> m (a, s)

unitLnpM :: Monad rn => a -> InpM m s a unitInpM a = liftInpM (unit a)

bindInpM : : Monad m => InpM m s a -> (a -> InpM m s b) -> InpM m s b

(a 'bindXnpM' k) inp = a inp 'bind' \(M, outa) -> k va outa

liftInpM :: Monad m => m a -> SnpM m s a liftInpM x inp = x 'bind' \xi ->

unit (xl, inp)

zeroInpM :: MonadPlusZero m =a InpM m s a zeroInpM = \inp -> zero

plusInpM :: MonadPlusZero rn => I n p M m s a -> InpM m s a -> InpM m s a

(X 'plusïnpM' y) = \inp -> x inp 'plus' y inp

instance Monad m => Monad (InpM m s) where unit = unitInpM bind = bindXnpM

instance MonadPluszero m => MonadPlusZero ( ï n p M m s) where

zero = zeroInpM plus = plusInpM

If the underlying monad of the input monad is the monad for lists, the

type of the corresponding computations is

which is exacdy the sarne as the type of nondeterminisric pusers defined

earlier. If the base rnonad is the exception monad, the r d t i n g parsers are

determinisic, they either r e m a value of the form Ok (a, s), or they

r m the value Fail that denotes failure.

To obrain parsen that take as pvvneters a single position and the

whole input aring of tokens we simply apply the parametrized state reader

monad on top of the parametrued input monad. The type of the resulting

StRM (InpM m Int) String a = String - > Int - > rn (a, Int) .

That is each parser when applied to an the input aring of tokens and a single

position rewns a computation that involves a pair: the value of the parser

and the nart position for the next puter. The corresponding paner terni can

be defined as follows,

termEInt :: MonadPlusZero m => Qlar -> StRM ( I n p M m Int) Strirlq Int

termEfnt c s x 1 (xci) 1 1 (x>length s) I I (s!! (x-1) /= C ) = zero 1 otherwise = unit (eval c, %+il

where the function eval is application dependent, and retums the value that

corresponds to a character given as its panmeter. The corresponding

instance of the dass Parser is defined below-

instance MonadPlusZero (StRM (InpM m Int) String) => Parser (stm (InpM rn In t ) String) fnt where

tenn = termEInt

New parsers or eduators can now be constmcted by combining

simples procesfun using the operaton bind and orelse. For example an

evaluator with the structure that corresponds to the recognizer s dehed

earlier can be constructed as foliows.

e s (term 'a1 'bind' \xl - > e 'bind' \x2 -:,

e 'bind' \x3 -> unit (xl + x2 + x3)) 'orelse' ewty 0

Suppose that the funaion eval is defined as eval 'a' = 1. If the type of

the evaluator e is defined as

e :: Parser (StRM (InpM [] Int) S t r i n g ) Int => StRM (InpM [] Int) String Int

then the evaiuator applied to the string 'aaaaa" at position 1 retums a set of

values.

If the rype of the same evaluator is de£i.ned as

e : : Parser (StRM (InpM Ex Int) String) Int => StRM (InpM E x Int) String Int

then the same evaluator renims a single d u e .

5.4 Memoized Ianguage processors

The above processon can be memoized by repking the monad for

exception or nondeterminiSm with the combined monad for l i s and state

(exaaly the same combination as was previously introduced to memoize

recognizers). The values stored in the memo-table are now lins of pairs of

integer numbers. The fim component of each pair is the value retumed by

the evaluator, rhe second component is the suut position for the next

processor. The type of the resulting e v h t o n is

(StRM (InpM (ListM (St ( S t a t e [(fnt, I n t ) 1 ) 1 ) Int) String) Int.

Although the above type definition may look a little cornplex, this is

in fact the only addition that is reguired to obtain memoized processon. A

memoized evaluator me can now be dehed as

me = memoize "meN ((term 'a' 'bind' \xi -> me 'bind' \xZ -> me 'bind' \x3 -> unit (xl + x2 + - 1 ) 'orelse' empty 0 )

When applied to the string 'aaaaa" at position 1 it retums a pak set of

values and a memetable.

6 Complexity of memoized language processors

A dctailed complexity analysis of memoucd funaional recognizers can

be found in Appendix A. The d y s k there holds for monvlic recognizers

as described in rhis chapter. Each memoized recog&r is built up out of

tkee monads: the lisc monad, the state monad, and the state reader monad.

The use of the Iist monad guarantees that each processot is applied to al1

elements of the input lisr, and the corresponding resulu are combined ming

the function merge-res (that merges rwo lisu sorted in ascending order,

removing duplicata). The same fundion merge-res is used to combine

results of alternate processors (orelse). The use of the state monad to

implement mernohion gwrantees that die memoiable is passed to eadi

recognLer as a parameter and the modified m e m d l e is returned as irs

result. Fhdy, the =te reader monad enables accessing curent position of

the input h g .

The r n e m d l e is structufed as a lia of (n + 1) pairs where the first

element of evch pair is an integer number i represenring the sta* position;

the second element is a list of results correspondhg to each application of a

recognizcr at the position i. Each result is a iist of at most (n + 1) integer

numben. The number of recognizen is independent on the size of the input,

therefore the size of the memo-table is O(n3. Owing to the f a a rhar purely

functional propmxning languages do not (in genenl) ailow variables to be

destructively updared, each update of the memo-table results in the creation

of the new modifieci memo-table. The number of possible updates is linear in

the size of the input, therefore the space required by the algorithm is 0(n3).

This performance can be improved in a purely functional language that

suppoxts updatable abjects. In his fim paper about monads wA90] Wadler

noticed that it is possible, within the monadic fnmework, to add updatable

in-place arrays to ~u re ly functional programming languages without

compromising strong reasoning prinaples valid for these languages. He

proposed to implemenr such v n y s as an abstract data type with a set of

well-defined opentions The encapsdation of an array guarantees t h the

programmer annot dupli- it. Combined with the use of monadic

sequencing, it guarantees single threading of the array through the prognm.

Based on the Wadler's work on monab, Launchbury and Peyton Jones

presented a way of securely encapdating stateful computations that

manipulate multiple mutable objects [LA93, LA94, LA951. Updatable

variables are ninently implemented in the Glasgow Haskeil Compiler.

Time and space complexity of more cornplex processon such as

parsers or syntax-directed evaluators is application dependent. If the number

of different results to be retumed is exponential in the length of input, the

comsponding language procesx>r will have arponential complexity. One

advantage of the m o d c approach to building memoized language

processon is that the programmer can easily switch on and off the feamres

the processors incorporate (by sirnply redefining the type

In other words, dependhg on the application a memoized

version of the processon an be use&

of the processor).

or non-memoized

C h a p t e r 6

CONCLUSION

In rhis thesis we have describeci how memoization can be implemented

to improve effiaency of purely functional language processors. An

important contribution of this thesis is the monadic specification of nidi

processors. Mon& provide a general technique of adding various feahues to

purely funaional prognms. We have shown how mernokation can be

treated as one such feature.

The monadic approach dows one to abstacc over features a progmm

incorporates. A monadic program is built from components of an abstract

type m a. To add a new "effectn ro the program one simply changes the

meauhg of 'mn and then ad& or modifies only chose cornponents that deal

directIy with the &effeam king added. Assuming the correctness of the initial

prognm, only part. that have been added or modifieci m u s be tested.

The program given in Appendix B illustrates how expressive, modular,

and easy to modify applications one an develop using the system of

consmaor dasses and monads. The prognm consisu of a library of mon&

and a libary of simple monadic luiguage processors and combinaton. New

language procesors cui be built from simpler components by combining

them using higher-order combinaton. The behavior of such processon un

be changed by simply modifying their types.

We have used the technique proposed by Liang, Hudak, and Jones

[LI951 to consnia memoized luiguage processors, in a fully modular way,

from cornponenu that represent various "effecrs". We have ben pleasantly

surprixd by the flexibility of this merhod, in particdar, by the ease with

which it was possible to extend the technique for memoizing purely

hinctional recognizers io more complex Ianguage processors. The extended

technique can be used CO memoize parsers as weU as prcgnms that are

constructed as exenidle attribue gnmmvs Wt, FR951.

NOTE TO USERS

Page(s) not included in the original manuscript are unavailable from the author or university. The manuscript

was microfilmed as received.

M. P. Jones, and L. Duponcheel. Composing monads. Technical report YALEU/DCS/RR-1004, Yale University, Dept. of Computer Science, New Haven, Connecticut, December 1993.

M. P. Jones. The implementation of the Gofer hnctional progr;imming system. Technid reporc YALEU/DCS/RR-1030, Depanment of Computer Science, Yale Universiry, New Haven, Connecticut, USA, May 1994.

D. J. King, and P. Wadler. Combining monads. In Glasgow Workshop an FwactionuI Programming, Springer-Verlag Workshops in Computing Series, pages 134-143, Ayr, Scotland, J d y 1992.

J. Launchbury. Lay irnperative prognmming. In ACM SIGPtAN Workshop on Srare in Programming Languuges, Copenhagen, Denmark, June 93.

J. Launchbury, and S. Devon Jones. Liiy funaional state thseads. In Prograrnmzng Languages Design a d Impfemenration, ACM Press, Orlando, Florida, 1994.

J. Launchbury, and S. Peyton Jones. State in Haskeil. LASC 8(4), pages 293-341, December 1995.

S. Liang, and P. Hudak, and M. Jones. Monad transfomers and modular interpreters. In Conference Record of POPL '95: 2 2 d AOM SIGPLAN-SIGACT Symposium on Principks of Programming Lanpges, S a n Francisco, CA, January 1995.

S. Mac Lane. Categories for the Working Mathematician. Springer-Verhg, 1971.

B -MQ&.- Compurationd ~ a m b & s & u ~ ~ ~ japd monads. - - - - In - Pmceedings of the Fou& Annwl Symposium on Logic in Compter Science, pages 14-23. IEEE June 1989.

E. Moggi. An abaracc view of programming languages. Technical report ECSLFCSSO-113, Dept. of Computer Science, University of Edinburgh, Scotland, A p d 1990.

B. C. Pierce. Basic Category Theory for Computer Scientists. MIï' Press. 1991.

J. Petenon, and K. Hammond, editors. Haskell 1.3, a non-strict, purely functiona! laquage. Technical report YALEU/DCS/RR- 1106, Depvcmenr of Cornputer Science, Yale University, New Haven, Connecticut, USA, May 1996.

M. Spivey. A hnctional theory of exceptions. Science of Compter Progromming, 14(1), pages 25-42, June 1990.

G. L. Steele. Building inrerpreten by composing monads. In Principles of Programming Languages, pages 472492. ACM Press, January 1994.

Turner, D., A., An overview of Miranda. In D. A. Turner, editor, Research Topia in Firnctionaf Aograrnming. ACM, June 1990.

P. Wadler. How to replace failure by a lin of successes. Conference a Functzctzond Programming and Compter Architecture, LNCS 201, Springer-Verlag, September 1985.

P. Wadler. Comprehending monads. In Conference on Lisp and ~itnctiona~programming, pages 61-78, ]une 1990.

P. Wadler. The essence of funaional prognmming. In PraKples o f Pronramminn Lanmpes, paaes 1-14, January 1992.

A p p e n d i x A

MEMOIZING PURELY-FUNCTIONAL TOP-DOWN BACKTRACKTNG LANGUAGE PROCESSORS2

Richard A. Frost and Barbara Szydlowski School of Compter Science, University of Windsor, Windsor, Ontario, Canada N9B 3P4

November 1996

Langwge processors m ~ y be implemented d i d y as fifunmm In a prugrammzng Lznguage rhat supports higher-urderfitt~~t~~ons, large processon u n be kilt by combining d l e r components Kcing highm-order functim cosresponding to altematàm and sequencrquencrng in the BNF notation of the grarnmar of the kangwge to be p r o c d I f tbe bigher-orderfirfzctt-ons are &defi& to irnpImmt a t o p d m Mtracking pamrSZng mtegy, the processm are modukar anrl, owing to the fa thut tbey r m M e BNF notatrtatrm, are ~ s y to understand and modif A m j o t disadvuntage of this aprOrZCh is k t the

processm whik pzwviing their modukzrity. We s h o w that memoized f i n c t i d retognizas cOICStTUCted fm arbitrary non-left-recum've grnmmars have qd) cmplem'ty wbm n is the length of the input to b e p ~ o c d ï%e paper &O shows how the inirial processon couid have been m o i z e d mUSLdefig a m d i c

1 Constmaing modular nondetermînistic laquage

processors in functional programming languages

One approach to implemenring laquage processors in a modem

funaional programming language is to define a number of higher-order

functions which when used as infix operaton (denoted in this papa by the

prefix $) enable processon to be built with structures that have a direct

correspondence to the gramxnars defining the languages to be processed. For

example, the function S, defined in the functional program in Fig. îî, is a

recogn;Zer for the language defined by the grammar s ::= 'a' s s 1 empty if

the functions term, orelse, then, and empty are defined as shown in the

next few pages of this paper.

s = (a $th- s $then s) $orelse empty a = term 'a'

This approach, which is describeci in detail in Hutton [IO], w u

origindy proposed by Burge [2] and further developed by Wadler [18] and

Fairbum [4]. It is now frequendy used by the functional-prognmming

community for language prototyping and n d - l a n g u a g e processing. In the

following, we describe the approach with respect to language recognizers

ahhough the tecbniqw can be r d y extended to parsers, syntaxdkected

wduaton and executable sp&cations of attribute grammus [l, 6,7,123.

Accordhg to the approach, recogn;Zers are hinctions mapping lisa of

inputs to lim of outpuu. Each entry in the input l i s is a sequence of tokens

to be analyzd. Eadi entry in the outpur lisr is a sequence of tokens yet to be

processed. Using the 12otion of 'fadure as a lin of successes" [18] an empty

output lin sigdies chat a recognizer has failed to recognize the input.

Multiple entries in the output occur when the input is ambiguous. In the

examples in diis paper it is assumeci that all tokens are single characters. The

notation of the prognmming language Miranda' [iT] is used throughout,

rather rhan a functional pseudo-code, in order that readers can experiment

with the definitions directly.

The types token and recognizer may be defined as follows where - - meam "is a synonym forw, x -> y denotes the type of functions from objects

of type x to objects of type y, and squve bnçkets denote a h.

token == char recognizer == [ [token] 1 - > [ [token] 1

That is, a recognizer takes a list of h s of tokens as input and renirns a lin of

Lins of tokens as r d t . Note that this differs from the type found in many

other papcn on functional recopkts. The reason for this differuice is that

it simplifies the memoiution process as will be e x p l a i d later.

The sirnplest type of recognizer is one h t recognizcs a single token at

the beginning of a sequence of tokens. Such recognizen may be constmaed

using the higherorder function terni defineci below. The notation x :: y

declares x to be of type y. The function concat takes a list of lisrs as input

and concatenates the subIists ro form a single list. rnap is a higher-order

hct ion which takes a funaion and a lisr as input and reninis a list that is

obtained by applying the function to each element in the input kt. Function

application is denoted by juxtaposition, ir. f x meam f applied to X.

Function application has higher precedence than any operator, and round

brackets are used for grouping. The empry lis is denoted by [l and the

notation x : y denotes the list obtained by adding the element x to the front

of the Lisr y. The applicable equation is chosen through pattern matching on

the left-band side in order from top to bottom, together with the use of

guuds following the keyword if.

term : : token -> recognizer term c inputs = (concat . map test-for-cl inputs

where test-for-c II = Il test-for-c (t:ts) = [ts] , if t = c test-for-c (t:ts) = E l , if t -= c

The following illustrates use of term in the construction of two

r e c o ~ r s c and d , and the subsequent application of these recognizen to

three inputs. The notation x => y is to be read as ÿ is the result of

evaluating the expression x". The empty lin in the second example s i g d e s

that c failed to recognUe a token 'CI at the beginning of the input U ~ n . The

notation %, . . x,," is shonhand for F,', . . ,'x,J.

d = term Id1

Alternate recognizers may be built using the higherorder funaion

orelse defiaed below. The operaror ++ appends two Lm.

orelse :: recognizer -> recognizer -> recognizer (p Sorelse q) inputs = p inputs ++ q inputs

According to th defition, when a recognizer p $orelse q is applied to a

list of inputs inputs , the value retumed is computed by appending the

results retumed by the separate application of p to inputs and q to inputs.

The following illustrates use of otelse in the construction of a recognizer

o r - d and the subsequent applicauon of this recognizer to three inputs.

cor-d = c $orelse d

cor-d Pabcnl = > E l cor-d Cncxyz"l => C"xyznJ c-or-d Endxyznl => [ n ~ n ]

SequenQng of recoga;Zers is obtained through use of the higherorder

then :: recognizer (p $then q) inputs

-> recognizer -> recognizer = (1 , if r = [] = q r , otherwise where r = p ipputs

According to this &finition, when a recognizer p $then q is apphed to a list

of inputs inputs, the r d t retumed is an empty lis if p fails when applied to

75

inputs, otherwise the r d t is obtained by applying q to the result retumed

by p. (Note that in general, then does not have the same effm as reverse

composition. In paxtidar, replacing p $then q by q . p will result in non-

terminating cornpuratioas for certain kinb of recursivelydefmed

recognizers.) The following illustrates use of then in the construction of a

recognUrr C-the-, and the subsequent application of C-the- to two

inputs:

The "emptf recognizer, which dways succeeds and which renirns the

complete List of inpuu as output to be procesd, is implemented as the

identity funcrion:

empty inputs = inputs

The funcrions term, orelse, then, and empty as defined above, may be

used to construct recognizers whose definitions have a direct a r u d

relationship with the context-free grammm of the languages to be

recognized. Fig. 23 illustrates this relationship.

BNF granunar of the language The program

[terminais = { @ a 8 ) a = term 'a' I Figure 23 The rcttionship between the grammar ancl

the pro- imp1rmmti.q the rtcogn;rtr

The example application given below illusvates use of the recognizer s

and shows diar the prehes of the input 'aaa" can be successfully r e c o g d

in nine different ways. Ernpty strings in the output, denoted by "",

correspond ro cves where the whole input 'aaa" has been r e c o g d as an

S. The output shows that there are five ways in which this can happen. The

nvo srrings in the output consisting of 'a" correspond to cases where the

preh 'aan has been recogmd leaving 'an for subsequent processing. The

output shows that there are m o ways in which this can happen. The string

in the outpur consisting of two letters 'aa" corresponds to the case where

the prefix 'an has been r e c o g d leaving 'aa" for subsequent processing.

This cari only happa in one way when 'a" is recognu«i as an S.

The major advantage of this approach is that the processors created are

modular executable speafications of the languages to be processed.

Components can be dehed, compiled and executed directly. For aample,

(a $then s $then S) is a recognizr rhir may be executed M y as for

exarnple:

The advanrages of building language processors using this technique

corne at a price. The processon employ a naive topdown fdy-badnracklig

searcfi arategy and comequendy exhibit exponential-time and space

behavior in the worst cw. In the following, we show how diis problem can

be overcome through a process of memoization. We begin by dixussing

tediniques h t have been proposed by other researchen conceming the use

of memoimion with topdown backtracking language processors We then

describe how memoization can be achiwed at the sourcecode lwel in

purely-functional prognmming languages and show how the technique can

be adapted for use to improve the effiaency of topdown backtracking

recognizers. We provide a formal description of the algorithm and a proof of

the complexity r d t . In addition, we show how the same result can be

obtained in a more struaured way by use of a monad. We condude with a

discussion of how the approach can be used with parsen and executable

attribute gnmmus.

2 Mernoking language processors

Memoization [9,14] involves a proceû by which funcrions are made to

automaucally recall previouslycomputed results. Conventional

implementvions involve maintenance of memetables which store

reference to iu memodle. If the input has been processeci before, the

previouslyamputed result is rrnuned. If the input has not been p r o 4

before, the r d t is computed using the original definition of the function,

with the exception that all recursive calls use the memoized version, the

memo-table is then updated and the result rmmed.

Many of the efficient algonthms for recognition and parsing make use

of some kind of table to store weIl-formed substrings of the input and

employ a form of mernokation. Earley's algorithm D] is an example. In

most of these dgorithms, the parsing and table update and lookup are

interwined. This r d t s in relatively complex proasson that are not

modular. Norvig [16] has shown how mernohion can be used to obtain a

modular procesor, with propertia similas to Earley's algorithm, by

memoizing a simple modular topdown ba&rackhg parser genentor.

Norvig's m e m o i d paner generator cannot accommodate left-recursive

productions but would appear to be as efficient and general as Earley's

algonthm in aU other respects. Accordhg to N o ~ g , the mernoid

recopnizers have cubic complexity cornpared to exponential behavior of the

original unmemoized versions.

In Nomig's technique, mernohion is irnplemented at the s o u r m d e

level in Common Lisp through definition and use of a funnion called

memoize. When memoize is applied to a funaion f, it modifies the global

definition of f nich thu the new definition refen to and possibly updates a

memo-de. A major advantage of Norvig's approach is that programs may,

in some ases, be made mon efficient with no diange to the solvcesode.

definition. In Norvig's approadi, both the process of memoizing a function,

and the procen of updating the memo-table, make use of Common Lisp's

updateable funaion-name space. This predudes direct use of NoMg's

approach when language processors are ro be constructed in a purely-

hinaional programming language where updateable abjects are not

permitted.

Leermaken 1121 and Auguteijn [l] have also dexribed how

memoization can be used to improve the complexity of h a i o n a l topdown

backtracking language processors but have not indicated how the

mernokation process itself would be achieved In put idar , they have not

addressed the question of how memckation would be vhieved in a purely-

hinaional implemenration of the langulge processors .

A funaional programmlig laquage is one in which functions are k-

dass objects and may, for atample, be put in lists, passed to other funmions

as arguments, and retumed by functions as results. A purely-functional

ianguage, nich as Miranda [lq, LML [5], and Haskell [8], is one in whidi

functions provide the o d y control structure and sickffects, nich as

assignment, are not allowed. This restriction is a necessary condition for

referenrial rransparency, a p ropeq of prognms that simplifies reasoning

about them and which is one of the major advantages of the purely-

functional progrvnming style [19].

Owing to the facc that side-effects are forbidden, purely-functional

languages do not accommodate any form of updateable object.

Consequently, N o ~ g ' s technique for improving the effiàency of topdown

backtracking language processon cannot be implemented directly in any

purely-funaional language. However, we can adapt NoMg's approach if we

use a variation of memoization that ha. been describeci by Field and

Harrison [SI and investigated in detail by Khoshnevisan [Il]. This

memoization tedinique differs from conventional approaches in that memo-

tables are vsociated with the inputs to and outpuu from functions, nther

than with the functions themse1ves. A function may be memoized by

modifying its definition to accept a table as part of irs input, to refer to this

table before cornputhg a r d t , and to update the table before r d g it as

part of the output. The memetable k passed as an input to the toplevel ail

of recursively defined functions and is threaded through all recursive calls.

To illustrate this technique, we show how the Fibonacci function can be

merno id W e begin with a textbook definition given in Fig. 24.

f i b O = l fib 1 = 1 f i b n = fib In- il +fa [ n - 21, if n 1 2

Figue 24 Definitioa of the F i b a k hc t ion

Defined in this way, evaluation of the Fibonacci funaion has

exponenual cornplexity. The cause of the exponentid behnrior is the

Appcnduc A: Memoizing My-Funcrioual TopDown B;idrrncking Languagt Pnxcsson

replication of computation in the two recursive calls. This replication can be

avoided by memoiution. We begin by modtfying the d e f ~ t i o n of fib so

thar it acceprs a table as pan of its input and retum a table as pur of its

result. In the modifiecl definition, round brackets and commas are used to

denore tuples. The table tl, which is output fiom the first recursive QU of

ffib, is paswd as input to the second recursive cal1 of tfib. The table tî which

is output from the second recunive call is retumed as r d t from the t o p

level dl of ffib:

t f i b ( O , t) = (1, t) tfib (1, t) = (1, t) tfib (n, t) = (ri +r2, t2)

where (r1, tl) = tfib (n - 1, t) (r2, t2) = tflb (n - 2 , tl)

Note that ffib still hu exponential behavior. When applied to an input,

it renircs the table unchanged. Rather than modifying the definition of tfib

directly to make use of the memetable, as is done in Field and Harrison and

in Khoshnevisan, we choose to abstract the tablolookup and update process

into a general-purpose higherorder b c t i o n memo which we an apply to

tfib to obtain a mernoid version. This variation is comparable to Norvig's

technique. When memo is applied to a fuaction f it renvns a new funaion

newf whose behavior is exady the svne as f except that it refers to, and

possibly updates, the memetable given in the input.

In the definition of the function memo below, the expression mr $pos

1 denotes the first element of the list of memorized results mr. The

dehition of lookup makes use of a lis comprehension, [r 1 (y, r) t t; y = i] ,

which is to be read as "the list of all r such that the pair (y, r) is a member of

the table t and y is equal to the index i."

memo f = newf where newf (i, t) = (rl,tl)

where (rl,tl) = ( n u $pas 1, t) ,if mr -= 13

= (r2,update i r2 t2) ,if mr = []

(r2,t2) = f (i, t) mr = lookup i t

upâate i r t = (i, r): t lookup i t = Ir ((y, r) <- t; y = il

We can now complete the process of memoizing the Fibonau5 function by

applying memo to the two rmvsive cllk in the definition of ffib as shown

in Fig. 25. The result is a funaion calIed mfib which has linear complexity.

F i î5 A rnanoizcd version of the Fibonacci function

Some readers may realize that it is only necessary to store the m o

most-nantly computed values of the Fiboluai secpence in rhe rnemo-uble.

Modifying the h a i o n update accordingly would decrease the space

requirements of mfib but would improve neither M e nor space complexity.

It should &O be noted that there are many other ways to improve the

complexity of the FibonvQ funaion. We do not daim that the use of

memoization is the most appropriate tedinique in this application. M e have

chosen to use the Fibonacci function as an example so that our technique can

be easily compared with that describecl by Norvig who &O wd the

Fibonacci example for expository pwposes.

The technique described above is not as elegant as Norvig's in the

sense chat the process of memoization has r d t e d in changes to the

deGniUon of the Fibonacci funaion at the sourcecode level. Later we show

how ro reduce the number of dianges required for memoization and limit

them to local changes only.

A memoized functiod recognizer is a funaion that taces, as an extra

parameter, a memo-table containhg all previously cornputecl r d t s . One

approach to memoization is to modify the &tions of the functions terni,

$orelse, and $then, so that the recognizers built with them accept a memo-

table as part of their input and retum a memo-table as part of their output.

Next, a hi&er~rder furiction mernoize is applied to each rec0gn;Z.r to

create a memoized version of it.

4.1 The memo-table

In order to improve efficiency we have chosen to store the input

sequence of tokens in the memo-table and to represent the points at which a

recognizer is to begîn processing by a List of numben which are med as

indexes into that sequence.

The memo-table is s t rumed as a list of triples of length n + 1, where

n is the length of the input sequence:

merno-table == [ (nurn, token, [ (rec-name, [numl ) 1 ) 1 rec-name == [char]

The l u t element of the memo-table is a special token # representhg the end

of the input. The 6rst component of the ith triple is an integer i. This

number acts as an index into merno-table entries. The second component is

the ith token in the input sequence. The third component is a lisc of pain

representing all succssfd recognitions of the input sequeno starhg at

position i. The £irst component of each pair is a recognizer name, the second

component is a list of integer numbers. The presence of a number j, where

i I j 5 n + 1 in this lia indicates that the recognizer sucaeded when applied

to the input xquence beguintng at position i and finishg at position j - 1.

Initially, the third componenr of each triple in the memo-table is an

empy list. The following example shows the initial table corresponding to

the input "aaa".

Two operations are r+ed for table lookup and update. The

operation I O O ~ U P applied to an index i, a recognizer name name and a

memoiable t retums a list of previously computed end positions where the

recognUer name s u d e c l in processing the inpur beginning at position i.

The operation update applied to an index i, a r d r res and a memo-table t,

returns a aew merno-table with the ïth entry updated. A result is a pair

consihg of a recognizer name and a lis of succesid end-positions. Update

ad& the result res to the list of s u d recognitions corresponding to the

ith token.

lookup i name t = [1 , if i > #t = [bs ( (x, bs) c- third (t $pos il; x = n a m e ] , otherwise where third (x, y , z ) = z

update i res t = map (add-res i res) t

where add-res i res (x, term, res -1ist)

= (x, tem, res:res -list), x = i = (x, term, res-list) , otherwise

The hinction mernoize takes as input a recognkr aame n, a

recogxïizer f, a lisr of positions where the recognizcr should begm processing

the input, md a memotable. For eadi start position in the list, the funaion

memoize fk c& the function lookup to detexmine if this application of

the recognizer hu been computed previously. If 100kup retums a n empty

lia, the recognizer is applied, a new r d t is caldatecl and the funaion I

update is used to add the renilt to the memetable. Orherwise the

previously computed r d is remmeci. Results renvned for each of the sur

positions are merged with the removal of duplicares.

memoize n f (Cl, t) = ( , t) memoize n f (b: bs, t)

= (mergeres rl rs, trs) where (ri, tl) = (IIW $pas 1, t) , if m r -= [J

= (r2, update b (n, r2) t2) , otherwise (r2, t2) = f (Ebl, t) (TS , trs) = memoize n f (bs, tl) mr = lookup b n t

4.2 The memoized recognizas

The definitions of term, $then, and $orelse given in section 1 are

modifiecl to take as input a lis of positions where the r e c o e r should

b e p processing the input, and a memetable. Owing to the fact that the

entire input sequena of tokens is reprrsented in the memo-table, there is no

need for the recognivrs to explicitly retum unprocessed segments of the

input. Instead they retum a number as index into the input sequence.

The next modification to the definitions of $orelse and $then is to

aUow thrading of the memo-table through recursive calls. The funaion

tum is modified owing to the &a that the input sequence is now stored in

the memo-table. The funaion merge is used to combine and remove

duplicpes that arise if the same segment of the input can be r e c o g d in

more than one way by a recognizer. For recognition purposes such

duplicares can be considered equal.

mterm c (bs,t) = ((concat . (map test-for-cl)) bs, t) where test-for-cl b = [ l , if b > #t test-for-ci b = [I , if second (t $pos b) -= c test-for-cl b = [b + i] , if second (t $pos b) = c

second ( x , y, z ) = y

(p Sm-orelse q) (bs, t) = (merge-res rp rq, tq) where (rp, tp) = p (bs, t) (rq, tq) = q (bs, tp)

These fundions can now be used to improve the complexity of

functional recognirers whikt presening their n r u d simplicity and

modularity. As example, Figure 26 shows the relationship becween the

original recogaizer for the gnmmv s ::= 'a' s s 1 empty and the memoizcd

version. Note that ir is not necessuy to diange the definition of empty nor is

it n e c e u y to mernoLe the recognizers consvucxed with rnterm.

The original recagnizer The memoized version

s = (a Sthen a S t h u i a ) ma = cneumize -man ( (ms Sm -then ms SmOLOthen ms) Sotelse empty Sm-orelse empty)

a = term ' a 8 ma - mtem ' a t . Figure 26 The rrl;uionship betwem a rrcognizer and 'u

manoizcd vcnion

4.3 The Algorithm

We begin our description of the algorithm by presenting an example.

Suppose thar the string 'aa* is to be p r o 4 using the memoized

recognLer ms dehed in Figure 26. The initial input is as follows, where the

second component of the tuple is the initial mem~able:

OwLig to the fict that no resuIts have been computed yet and that ms

is an alternation ($m-orelse) of two recogaUrfi, the fim alternative of ms

is applied to the initial input. This recognizer is itself a sequence of the

recognizers ma and ms $then ms, therefore the i k t of this sequence, i.e.

ma, is applied ro the initial input. The recognizer ma succeeds in

recognizîng an 'a' and rmims a r d t consisting of a pair with first element

[2], indicating that the first element of the rquence of tokens hu been

coILSUrned, and the memetable unchariged (because basic recognizen do not

update the m e m d l e ) . The evalulrion tree at this point is as follows, where

? indiates values yet to be computed. Seqyen* is denoted by continuous

lines and alternation by broken lines.

ms I l l => [?l

Nm, ms $then ms is applied to th r d t . The applicuion of the

first ms in this sequene results in a similar computation to the initial

application except that the starting position is [2]. The same holds when rns

is applied to position [3). The rhird element of the input memerable

corresponds to the end-of-input. The recognizer ma applied at position [3]

fa&, retuming an empty List, and thus ma ms rns f a . . The recognizer

mempty applied at the same position renuns as r 4 t a tuple whose k

element is the List [3]. Now the results of both alternatives of the recognùer

ms have been detefmined and the value of ms applied at position [3] is

computd The foUowing shows the evaluation uee when al1 values up to

ms [2] have been c o m p u d

ms ms [21 => [?l

ma ms m s [2] => [3] mempty [2] => (21

Note tbat when the recognizer ms is applied to the posirion [3] for the

second t h e , the correspondhg r d t is simply copied from the memo-table.

When a recognizer is applied to a list that contains more rhan one

element, the result is obtained by applying the recognizer to each element in

the lisc and merging the r d = . This is illustratecl below:

The final result is:

The following is a more formal description of the algorithm:

1. Input:

a A context-free7 non-left-rrcursive, gnmmar with productions and

reminais represented uskg functions mtem, $mathen, Sm-orelse,

and mempty. The stan symbol for the grammar is the name of the

fim recognizer ro be applied.

b. A pair whose first component is the lisr [Il, and whose second

component is a memo-table corresponding to the input sequence of

tokens.

2. Output:

a A pair whose first component is a ILr of positions where the recognition

process of the input sequence of tokens (srarting from the fim token)

was successfully completed. The second component is the final n u e of

the memo-table.

a. At each sep we apply a recognizer to a lin of start positions and a

memetable:

e If the lin is empty, the resuit is an empty list and the unchangecl

memo-table.

Otherwise, we first apply the recognkr to the element of the

lin and the memo-table. The r d t is a lirr r i and a possibly

modified memetable 11. Then we apply the recognizer to the rest

of the list and the m e m d l e t l . The result of this application is a

list R and a munosable t2. The final r d t is a pair: a Iisr obtained

by merglig r l and R, and the table t2.

b. Application of a recoSn;es m at a position j begins by reference to the

current memo-table:

If the jth row of the memo-table contains a result corresponding to

m, this result is retumed.

Ocherwise a new r d t is computed, the memo-table is updateci

and the result retumed,

c. Each recognizcr can be either the basic recognizer rnempty, a basic

recognizer construaed using mten, or it can be a combination

constnicted kom two or more componerirs using $mathen or

$ m-O relse.

R d t s for basic recognizea are obtained immediately by applying

the corresponding function.

For sequenus or alternations, the results of the components are

computed k and then combined to obtain the final r d t .

5 CompIexity analysis

We now show that memoized recognirers have worst-case thne

complexity of 0(n3) compared to exponential behavior of the unmernoized

form. The analysis is concerned only with the variation of rime with the

length of the input lîst of tokens. Although a grvnmv could be very

cornplex, its size wili always be independent of the length of the input.

5.1 Elementary operations

We amune that the followiq operations require a constant amount of

time:

1. Testing if two values are equal, less than, etc.

2. Exvacting the value of a component of a mple.

3. Adding an element to the front of a iist.

4. Obraining the value of the ith element of a list whose length depends

upon the size of the grammar but not on the s U e of the input LX.

5.2 The size of the memo-table

The memo-table is strucnrred as a l is of (n + 1) tuples, where n is the

length of the input sequence of tokens. T h e £irst component of eadi tuple is

an integer mging from 1 to n + 1. The second component of a tuple whose

component is i, is the ich token in the input. The third component is a

lin of pairs (recognizename, result). Owing to die f a that the gr;immar

is fixed, the number of recognizus, denoted by r, is constant. Therefore, for

each tuple in the memmble, the le& of the lis of pain is s r.

The second component of d pair is a list of positions represented by

integers where the corresponding recogniw s u d in complethg the

recognition of a segment of the input. The length of the lisu that correspond

to the ith tuple is at most (n - i + 2) owing to the faa that a recognizer

applied to input at position i may succeed a any position j, i s j s r i + 1.

5.3 Memetable lookup and update

The function lookup applied to an index i, a recognizer name, and a

memetable, fim searches the memo-table to access the ith element, then it

searcfies the lis of r d t s in the irh tuple to access the element that

corresponds to the given recognizer name. The function IOO~UP requires

O(n) Ume.

The function update applied to an index i, a result res, and a mexno-

table, retums a new memetable wirh the ith tuple updated. The r d res is

added in front of the list of s u c d recognitions corresponding to the ith

token. The huicrion updare r+es O(n) Ume.

5.4 Basic rccognizm

Application of the recognizer rnempty simply creates a pointer to the

input. This takes constant rime. Application of a recognizer mterm a to a

single sran position i, requires the ith entry in die memo-table to be

examineci to see if the irh token is equal to a. If there is a match then the

r d i + 1 is added to the lis of resuiu returned by mterm .Othemïse the

recognizer fails. This operation is O@).

Note that w e are only considering, here and in the nacc two sub

sections, the Ume required to apply a recognkr to a single position in the

input liste We consider application of a recognizer to a more7han~ne

element list iater.

Asmming that the results p i and q i have been cornputed, application

of a memoized recognhr (p $m-orelse q) to a single start position [il,

involves the following seps: .

.r one memo-table lookup - O(n) - and, if the recognizer has not been applied before:

* merging of two r d t b, each of which is in the wont case of

1engt.h n + 1, - 0(n),

0 one memo table update - O(n).

Asnune that p [il has already been calculated. In the worst case the

resukis&elist[i, i + 1 , . . , n + 11.Assumealsothatq [i, i + 1 , . . , n + 11

has already been caldateci. Now, application of a memoized recognUrr p

$m-then q to a single start position i involves the following steps:

one memo-table lookup (O(n)) and if the lookup fa&,

compurWon of the result plus

one mem~able update (O(n)).

5.7 Maging r d t s returncd whm a recogniza is applied to a list of

start positions

The funaion merge is aLo wd to combine the r d t s renuned by a

single memoized recognkr when applied to a list of start positions wirh

96

more than one entry. (See the definition of memoize). Suppose a recognizer

f is applied to a k~lement lin of start positions [1 , . . ,k]. The corresponding

evduarion tree is as foilows:

f Ekl

Assuming that the results of f [il and f [i + 1, . . ,k] have already been

computed, computation of f [i. i + 1. . . ,k] res;'es one memoiable lookup

(O(n)) and one merge, of FWO liSU whidl are in the worst case of length n +

1. The total time is O(n). Note 2LO rhaf application of a mognizer f to a k-

element lis of s t a t positions, r d t s in an exmirion tree with 2*k + 1 nodes

representing applications of the recognizer f.

The analysis so far cm be summarized in terms of mecution trees

(sudi as those shown eulier). Each non-leaf node of an execution tree

corresponds to an application of a recognkr to a lis of start positions, or to

an application of m-orelse or m-then. Leahodes correspond either to an

97

application of mernpty, or mten a for sorne a, or to a computation that

has been performed before and nord in the memetable:

Lemma 1: We have shown that the resuit corresponding to mernpty,

mtem a for some a, and 100kup cui be cornputeci in O(n) t h e .

L~XIIIM 2: We have also shown that r d t s corresponding to non-leaf

nodes can be compured in O(n) UM providecl that the values of their

children are avaiiable.

5.9 Proof of 0(n3) time complexity

Theorem

Given an arbitrary context-free non-lefc-recursive grarnrnar G, the

corresponding mernoid fuactional recopkr requVes 0(n3) time to

process an input sequence of length n. If the grammar is not ambiguous, the

time complexity is O(n3.

Let f,, f , . . ,f, be a set of recognizers corrcsponding to the gnmmar O,

and let f, correspond to the start symbol in the grammar. We begin by

applying the recognizer f, to the lisr [Il. This application yielb an exmition

tree similar to the ones shown eulier. We will show that for an arbitrary

grammar the number of nodes in sudi a tree is O(n4 and if the gnmmv is

not ambiguous this number reduces to O(n). Owing to the fact diat the rime

r+ed to perform cornputuions at euh node is linear in the length of the

input seqyence (Lemma L and Lemma 4, this condudes the proof of the

theorem.

For simplicity assume that each recognizer is either mempty, mterm a

for some a, or is of the form (p $m-orelse q), or (p $m-then q) for some p

and q. In practice recognizen c m be a combination ($m-orelse andior

$m-then) of more than cwo recognizen, but the number WU always be

bounded by the s h of the grvnmar and will be independent of the length of

the input SeQuence of tokens.

Suppose that the recognizer f, is of the fonn (4 $m-orelse f) for some

2 s i, j s r. Suppose &O that the recognizer f, is of the form (f, $m-then f,)

for some 2 k, p 6 r and k # i, p # i. The corresponding tree in the wom

case is as follows:

f, [ I l

Consider the expansion of those subuees that correspond to an

application of a recognizer to the one-element lia of start positions [Il.

Owing to the fact that the gnmmv is non-1eft-recursive and that it consisu

of r recognizers, after a maximum number of seps in each path, which

depends o d y on the size of the grammar, there must be an application of a

recognUcr that consumes some input. It follows thar the total number of

99

applications of a recognizer to a one-element lin is independent of the length

of the input. For the same reason, the total number of applications of a

recognizer to a more than one-element lis (in the wom case an (n + 1)

element lin) is independent of the length of the input. d

When the £irst step is completed, there will be only O@) subtrees to be

furcher expanded. This is because the result corresponding to a pair

(recognizer, start position) is d d a t e d only once. If the same recognkr is

applied to the same start position a+, the comsponding r d t is sixnply

copied from the memetable. At the next stage, the same procedure is

r e p t e d for each recognizer that is applied to the lin [2] . The only

difference is that now O(r) subtrees mus be expanded not just one. Only

O(r) nodes will 'be generated for each recog-r applied to the lin [Z], and

O@) nodes for each application of a recognizer to a morethan-oneelement

list.

At the ith sep, there WU be O(r) nodes correspondhg to an

application of a rocognizer to the lisr [ij, and O(r) nodcs correspondhg to an

application of a recognizer to a more-thanaxdement list (in the wom case,

an (n - i + 2) element h). The total number of steps is n + 1. Owing to the

fact thu an application of a reagnizer to a kelement list yields a tree that

contains 2 ' k + 1 nodes, as dtwsed in subsection 5.7, the total number of

nodes is given by the following, where c is proportional to the number of

If the grammar is not ambiguous then each input sequence of tokens

can be r e c o g d in jus one way. Therefore, euh recognizer applied at

some position i will renirn at mosî a ondement-lia as result. The

comsponding formula for unambiguous gammars given below condudes

the proof.

6 A monadic approach to incorporate memoization

So far, we have used an ad hoc method to redehe die recognizer

functions in order to incorporate memoization. This method is susceptible

to error. In fact, an earlier version of the paper contained an insidious error:

the fundon m-then was defined ts foliows:

Accordkg to thL definition, if the recognizer p fails, then the memo

table is rrnirned unchanged in the result (0, 1). This error would r d t in

exponential mmplexity for certain gnmmvs when applied to certain inputs

which fail ro be recognuod In this seaion, we show how the recognizer

definitions an be modifiecl in a s t d way which reduces the possibility

of such errors. The merhod treats memoization as a specific instance of the

more generai notion of adding featttres to purely functional programs.

6.1 Monads

Monads were introduced to cornputhg science by Moggi [15] who

noticed that reasoning about programs dut involve handling of the state,

exceptions, VO, or nondeterminism can be simplifieci, if these fanires are

expressad using monadr. Inspirecl by Moggi's ideas, Wadler [21] proposed

monads as a way of structuring functional progruns. The main idea behind

monads is to &guish b e e n the type of values and the type of

computations that deliver these values. A monad is a triple (M, unit, bind)

where M is a type consuuctor, and unit and bind are cwo polymorphic

functions. M can be rhought of as a funaion on types, that maps the rype of

vdues into the type of computations produckg these values. unit is a

function that takes a value and rrnuns a corresponding cornpudon; the

type of unit is a -> M a. The function bind represenu sequencing of m o

computations where the d u e retumed by the hm cornpumion is made

available to the second [and possibly subsequent] cornputuion. The type of

bind is

The identiry monad [21] below represents cornpuotions as the dues

they deliver.

unitl :: -> id unitl x = x

The state monad (also dehed in [ZID is an abstraction over

computations that deal with the aate. The definition is given below.

unit2 :: * O - > stm * unit2 a = f

where f t = (a, t)

bind2 :: stm ** -> (* -> stm **) -> s t m ** (m Sbind2 k) = f

where f x = (b, z)

where (b, z ) = k a y

where (a, y ) = m x

We will use the identity and the state monad to constnicr non-

memoized and memoted m o d c recognizers respectively. In the

description below, we d e r to a third monad which we use as an d o g y in

the wnstruceion of our m o d c recognkrs. This is the manad for lists 121 3.

Owing to the f ict rhu our recognivrs cari be applied to a list of inputs, it is

necessvy to have a well strtlctwed way of doing tha.

list == [*]

unit :: -> list *O

unit a = [al

bind : : list -> (* - > list **) - > list ** II Sbind y = 11 (a:x) $bind y = ( y a) ++ (x $bind y )

6.2 Non-memoized monadic recognllers

ki order to use rnonads to provide a structured method for adding new

effecu ro a funaional program, we begin by idenufying all h a i o n s that

will be involvecl in those effects. We then replace those functions, which can

be of any type a -> b, by functions of type a -> M b. In effm, we change the

program so that selected funaion appiicaxions renvn a computation on a

value rather than the value iuelf. This computation may be used to add

features such as sate to the program. In order to effect this change, we use

the fundion unit to convert values into computations that r e m che value

but do not contribute to the new effects, and the function bind is used to

apply a hinaion of type a -> M b to a computation of type M a. Having

made these changes, the ori@ progrvn can be obtahed by using the

identity monad idm, as shown below. In order to add new effects such as

state, or exceptions, we simply change the monad and make minimal local

changes as rqired to the rest of the program. In the following subsection,

we show how to add the new effect of memoiution by replacing the

identity monad with the state monad strn, and making some local changes.

The non-munoized recognizers introduad earlier in this paper were

functions caEiag a list of input sequences of tokens and returning a sirnilar

Lst of sequences yet to be processed. The definition of the non-memoized

m o d c recognUcrs differs slighrly in thar the list of inputs is represented by

a pW: a list of nan positions and the whole input sequence of tokens.

Owing to the fact that the input sequence remains unchanged during the

execution of the prognm, rhere is no aeed for any recognizer to r e m it.

In order to constnict non-mernoid monadic recognizers, we start by

defiriing the type of non-memoized recognizers. We define ic using the type

constnictor idm of the identity monad.

That is, a r e c o e r of type a is a hinction that applied to an input string

and a lisc of starc positions renirns an 'identity" computation of type a. We

can now d e h e die function t e - 1 , which when applied to a c h m e r ,

reniros a function that is always applied to a oneelement list of positions.

te-1 : : char - > rec ~ n m l term-1 c s [x] = unitl [J , if [x > #SI \/ (s! (x-1) -= cl

P unit1 Ex+l], otherwise

In d o g y with the bind opemor for the list monad, the h a i o n

terni, which when applied to a character renvns a recognizer that can be

applied to more than oncelement list of start positions, is defined as follows.

term :: char - > rec m nu ml term c s 11 = unitl 11 term c s [x:xs] = term-1 c s [xl $ b i n a f

where f a = term c s xs $bina g where g b = d t i (merge-res a b)

The definitions of orelse, then, and empty are given below. Note that

we have replacecl the append operator ++ with the function merge-res that

combines rwo Lisrs removing duplicates.

orelse : : rec [numl - > xec [numl -> rec [nml (p $orelse q) s input = p s input Sbindl f

where f a = q s input Sbindl g

where g b = unitl (merge -res a b)

then : : rec [numl -> rec [num] - > rec [nm] (p $then q) s input = p s input $bindl f

where f a = unit1 I l , if a = = q s a , i f a - = I I

ernpty : : rec [numl empty s x = unitl x

Notice that we have not rewritten the application of merge-res using

bindl. The reason for this is that we know that in rhis application,

rnerge-res will not be involved in memoiution and therefore the result of

its application can be viewed as a value rather than a computation.

6.3 Memoizcd monadic recognizas

We now consider the state monad as given d e r and define the two

operations on the state: lookup and update. The type of the state is [num,

[([char], [numlll*

lookup : : num -> [char] -> stm [ [numl 1 lwkup ind n a m e st = ( [ ] , st) , if ind > #st = ( [bs 1 ( x , bs) <- (snd ( s t ! (ind-1) ) ) ;x=name], st) , otherwise

update : : num - > [char] - > [numl - > stm [J update ind name val st

= (undef, map (updatemtentry ind name val) st)

update mt-entry ind name val (x, l ist l ) = (x, ( n a m e , val) : list) , if x = ind = (x, list) , otherwise

We define the type of the memoized recognizers in terms of the type

constructor of the state monad strn:

rec * == [char] -> [numl -> stm +

and define the function memoize (there is an anaiogy to the definition of

terni and bind for the list monad here).

memoize : : [char] - > mec [numl - > mec [numl memoize n a m e f s [] = unit2 [l memize name f s ( x : x s ) = memoizei name f s [XI Sbind2 g

where g a = memoize name f s xs $bhd2 h

where h b = unit2 (merge-res a b)

memoizei :: [char] -> mec [num] -> mec [numl memoizel name f s [il 5 lookUp i name Sb- g

where g a = unit2 (a!O) , if a -= 11

= f s [il Sbind2 h, otherwise where h b = update i name b $bUid2 r

w h e r e r any = unit2 b

The definitions of term, orelse, and then remain unchangecl except

that unit1 and bindi are replaad by unit2 and bind2 respectively. The

memo-table is compldy hidden in the definition of terni, orelse, and then.

One of the advantages t that having identifiecl all recognizcr functions as

being involved in the mernokation effecr, the monadic form of orelse is

araightforward and thereby this approach reduces the diance of making the

kind of error referred to at the beginnlig of this section. The definition of

monadic memoized recognizen is exactly the same as with the original

memoized recopkers, and the complexity analysis presented earlier holds

also for memoized monadic recognizers.

The followiq table shows the results obtained when an unxnemoized

monadic recognizer for the ginmmv s ::= a s s ( ernpty and a memoized

monadic version were applied to inpucj of various length. The results

suggests that the recognizea h o have O(n3 space complexity as weu as

O(n3 Ume complexity.

iuimemoized I memoized

1 out of spaa 1 155,526

More information on the use of monads to structure functiod

language processon can be found in Wadler [20,21,22].

7 Memoizing parsers and syntax-directed evaluators

The memoization technique presented in this paper could be readily

exrendecl so that the memetables conmin parse tables similar ro those

created by Earley's dgorithm 133, or the more compact representation of

factord syntax trees suggested by Leiss [13]. However, to do so would not

be in keeping with an approach that K commonly used by the ~urely-

functional prognmming community in building language processors. That

approach is to avoid the explicit co~lsuuction of syntax trees unless the mees

are specifically required to be displayed as part of the output. Instead of

comtructing labeled syntax trees which are subsequently evaluated, an

alternative approach is wd: semantic actions are dosely associateci with the

executable grammar productions SQ rhat semantic attributes are computed

d i r d y without the need for the urpliat representation of syntax trecj.

Userikhed types an be introduced to accommodzte different types of

attributes as has been done in the W/AGE attribute grvnmar programming

Ianguage m. This approach is viable owing to the kywduation Stntegy

employed by most purely-functional languages. The memoiution technique

described above cm be uKd to improve the efficiency of such syntaxdirected

evauators with two minor modifications:

1. The definition of m-orelse is changed so that the function merge

is mplaced with a function that removes results that are regarded as

dupliates under applicationdependent cnteria which may be les

inclusive than the criterion used for recognizen. Resuits that are

remrned by a recognizer are regarded as duplicates if rhey have the

same end points. For recognition purposes the end points are all

that is required to be maintained in the memetable. With syntax-

directeci evaiuarors, the end points may be augmented widi

semantic values. A single end point pair may have more &an one

value associatecl with it. In some cws synractic unbiguicy may

r d t in semantic ambiguity. Results retumed by a Ianguage

processor would only be tegarded as king duplicates if they have

the same end points and have @valent semantic attributes. The

funcrion merge would be replwd by an applicationdependent

funaion that identifies and removes such duplication. In this

approach, if syntax trees are required as part of the output, they are

simply treated as another attribute. In such cases syntactic

ambiguity is Wmorphic with semantic ambiguity and the funaion

merge would be replaced by conaenation in the definition of

2. The memo-tables and the update and lookup funclions are

m&ed according to the attributs that are required in the

application.

One advantage that derives fiom this approach is that ail unnecessvy

computation is avoided. Memoultion prevuits language procesfors from

reprocessing segments of the input already Wited and the use of merge, or

an applicationdependent version of it, removes duplication in mb-

components of the r d c as soon as it is possible to detect it. It should be

noted that the complexity of laquage processon constructed in thk way is

application dependent. If syntax trees are required to be represented in full,

the language processor may have exponential compleriry in the wom case

owing to the fact that the number of syntax uees un be exponential in the

length of the input for highly ambiguous grammars. A compact

representation of the trees could be produced in polynomial rime and the

trees could then be ~assed on to an evaluator. Howwer, rhis would detract

from the modularity of the language processor and wodd provide no benefit

if the trees were to be subsequently displayed or otherwise processed

sepantely as this couid be an exponential proass.

This paper was inspireci by NoMg's dernomration that memoization

can be implemented at the sou~ce-code level in luiguages such as Common

LLp to improve efficiency of simple language proasson without

compromting their simplicity. We have shown that NOM& technique can

be adapted for use in purely-funaional prognmming languages that do not

admit any form of updateable object. The technique desaibed in this paper

can be thought of as complementing that of N o ~ g ' s in t h t it enables

memoization to be used to improve the effiaency of highly-modular

language processors constructeci in purely-functiod languages.

This applicarion has &O i l l u s t d how mon& can be used to

structure functiond programs in order to avoid errors when modif~cations

such as the addition of state are made. We are now exploring the use of

monab in the memoization of prognms that are comruaed as executable

9 Acknowledgments

We would like to rhank Young Gil Park, Dimitris Phoukas, and Yung

H. Tsin of the University of Windsor for helpful discussions, and to the

anonymous r e fe ra for carefd reading of the paper, for many useful

suggestions, and for idendying the error in the code that was d i d in

section 6. Richard Frost also adrnowledges the support received from the

10 References

111 Augusteijn, L. (1993) Fundonal Programming, -am Tranüfoormatzons and Compiler Comtndon. Philips Research Labo rat ories. ISBN 90-74445-04-7.

121 Burge, W. H. (1975) R e m i v e Progamming Techniques. Addison-Wesley Publïshing Company, Reading, Massachusetts.

131 Eadey, J. (1970) An efficient contez-free parsing dgorirhm- Commun. ACM 13 (2) 94-102.

[SI Field, A. J. and Harrison, P. G. (1988) F u n c t i d Plogramming. Addison-Wesley Publishing Company, Reading, Massachusetts.

161 Frost, R. A. (1992) Constnicçing programs as executable attribute grammars. The Computmjoumal35 (4) 376-389.

Frost, R. A. (1995) W/AGE The Windsor Attribute Grammar Programming Environmem. SchIoss Dagnuhl Intenrrrtional Wo~kshop on Functionai pogramming in the Red World.

Hudak, P., Wadler, P., Arvind, Boutel, B., Fairbairn, J., Fasel, J., Hammond, K., Hughes, J., Johnsson, T., JSieburtz, D., Nikhil, R., Peyton Jones, S., Reeve, M., Wise, D. and Young, J. (1992) Report on the programming language Haskell, a non-strict, purely funcrional language, Version 1.2 A CM SIGPLAN Notices 27 (5).

Hughes, R. J. M. (1985) Lazy memo functions. In proceedings. Confwence m Frunctionul Progrmming und Compter Architecture Nancy, France, September 1985. Springer-Verlag Lecture Note Series 201, editon G. Goos and J. Hucmanis, 129 - 146. 4

Hutton, G. (1992) Higherorder functions for paning. Journal of Funmmunal Programming 2 (3).

Khoshnevisan, H. (1990) Effcient memo-table management arategies. Acta 1nfoll l~l~ti.u 28,43-8 1.

Leerrnakers, R. (1993) fKe Functimi T~eatment of Parszng. Kluwer Academic Publishen, ISBN 0-7923-9374-7.

Leiss, H. (1990) O n Kilbury's modification of Earley's algorithm. ACM TOPLAS 12 [4] 610-&W.

Michie, D. (1968) 'Memo' functions and machine leaming. Nature 21 8 19 - 22.

Moggi, E. (1989) Computational larnbdasalculus and monads IEEE Sympokm on Logic in Compter Science, Asilomar, California, June 1989, 14-23.

Norvig, P. (1991) Techniques for automatic memoisation with apphcations to contact-free parsing. Computrtiod Linguistics 17 (1) 91 - 98.

Turner, D. (1985) A l a y funcrional progrunming language with polymorphic types. hoc. IRP In+ Conf: m F u n c t i o ~ l Progrdmmimg Languages and Compter Architearrw. Nancy, Fmce. Springer Verlag Lecture Notes in Cornputer Science 201.

1181 Wadler, P. (1985) H o w to replace fadure by a List of successes, in P. Jouannaud [ed.] Functiod Progrumming Langwges and Comptrter Architectures Lecture Notes in Cornputer Science 201, Sp~ger-Verlag, Heidelberg, 1 13.

WI Wadler, P. (editor) (1989) Special issue on lazy hinctional programming. n e Computer/ouml32 (2).

Cm Wadler, P. (1990) Comprehending monab. A W SIGPLA N/SIGA CT /SIGAR T Sjmposium on Lisp and FzinctîoruI Programrnzng, Nice, France, June 1990, 61-78.

Lw Wadler, P. (1992) Monads for functiond prognmming. Marktoberdorf Summer School on Program Design Calculi. Springer Vehg Lecture Nota in Compter SCiet~~e.

w] Wader, P. (1995) Mon& for functiond progamming, Proceeding of the Banad Spring School on Advanoed Furnional Prognmming, ed J. JNing and E. Meijer. Springer Verkzg Lecture Notes in Computpr SEMtce 925.

A p p e n d i x B

IMPLEMENTATION OF MONADIC LANGUAGE PROCESSORS USING TYPE CONSTRUCTOR

CLASSES IN GOFER

class Monad m where unit :: a - > m a bind :: m a -> (a -> m b) -> m b

class Monad m => MonadPlusZero m w h e r e plus :: m a -> m a -> m a zero :: m a

class MonadPlusZero m => Recognizer m a w h e r e te* :: Char -> a -> m a emptyR :: a - > m a orelseR :: (a -> m a) - > (a - > m a) -> (a -> rn a) th& :: (a - > m a) -> (a -> m a) -> (a -> m a) failR : : m a

emptyR = u n i t (p 'th- q) inp = p inp *bind' \xl ->

q xl 'bind' \x2 -> unit x2

(p 'orelseR' q) inp = (p inp) 'plus' (q inp) failR = zero

class MonadPlusZero m => P a r s e r m a where term :: Char -> m a empty :: a -> m a orelse :: m a -> m a -> m a f a i l :: m a

empty = unit orelse = plus f a i l = zero

A@& 8: Imp1aneat;ition of Mo& h n p g e Processors Using Type Co-or Classes

class Reccgnizer m a => MRecognizer m a where memoizeR :: String -> (a -> m a) -> a -> m a

class Parser rn a => MParser m a where memoize :: String -> m a -s m a

- - LIST MONAD type List a = [a]

in unitls, b i n a s , termEïnt

unitls : : a -> [a] unitLs a = [a]

bindLs :: [a] -> (a - > [b]) -> D l x 'bind~s' k = case x of

LI - > C I (a:xl-2 (k a) 'merge-res' (x 'binas' k)

instance Monad [] where unit = u n i t L s bind = bindLs

instance Monad~ïusZero [] where zero = ( 3 plus = mergeres

merge-res :: [a] -> [a] - > Cal merge-res x [I 2: x merge-res [l y = y merge-res (x:xs) (y: ys)

I X C C Y = x:merge-res xs ( y : y s ) 1 Y C C X = y:merge-res ( x : w ) YS [ otherwise = x:merge-res xs ys

- - To avoid problems with the class Ord defined in the -- standard prelude primitive (cc) nprimGenericLtn :: a -> a -> Bo01

-- STATE MONAD type St s a = s - > (a, SI

in unitst, bindSt, lookupst, updatest, newSt , parsest, memizeEv, parseStEv

AppndS B: Implaripmrion of Mo& laquage Pmccrron Using Type Commuaor CLrrer

bindSt :: St s a -> (a - > St s b) -> St s b (a 'bindSt' k) t = k va ta

where (va, ta) = a t

instance Monad (St s) where unit = unitSt bind = bindçt

type State v = [(lnt, [(String, v) ] ) ] in unitSt, bindst, lookupst, updatest,

newSt, parsest, memoizeEv, parseStEv

lookupst : : Int - > String - i St (State v) [v] lookupst ind name t

1 ind > length t = ( [ l , t) 1 otherwise =([bsl (x,bs)<-(snd(t!!(ind-l))),x=~mel,t)

up&teSt :: Int -> String -> v -> St (State v) 0 upâatest ind name val st

= ( 0 , map (update-mt-entry ind name val) st) where update-mt-entry ind name val (x, Mot)

1 x = = ind = (x, (name, val) : list) 1 otherwise = (x, Est)

- - EXCEPTION MONAD data Ex a = Fail 1 Ok a

bindEx :: Ex a -> (a -> Ex b) -> Ex b (Ok a) 'bindgx' k = k a Fail 'bindEx' k = Fail

instance Monad Ex where unit = unit- bind = bindEx

instance MonadPlwZero Ex where plus = plusEx zero = zeroEx

-- PARAMETRIZED STATE READER MONAD type StEZM m s a = s -> m a

in unitStRM, liftStRM, bFndStRM, zeroStRM, plusStRM, termInt, parselnt, termEInt, parseEInt, memoizeRec, parsest, mernoizeEv, parsest-

liftStRM : : Monad m => m a -> StRM m s a liftStRM x t = x

unitStRM :: Monad m => a -> StRM m s a unitStRM x = liftStRM (unit x)

bindstRM :: Monad m => StRM m s a -> (a -> S t W m s b) - > StRM m s b

(a 'bindStRM' k) t = a t 'bind' \va -> k va t

zeroStRM :: MonadPlusZero m => StRM m s a ZeroStRM = \s -> zero plusStRM :: MonadPlusZero m

=> StRM rn s a -> StRM m s a -> StRM m s a (x 'plusStRM' y) s = x s 'plus' y s

instance Monad m => on ad (StRM m s) where unit = unitStRM bind = bindStRM

instance MonadPlusZero m => MonadPlusZero (StRM m s) where

zero = zeroStRM plus = plusStRM

Appn<lu B: Implemmmuon of Mo& fuiguîge Proceson Using Type Comtruaor Classes

- - PARAMETRIZED LIST MONAD type L i s t M m a = rn [a]

in u n i t L s M , b i n d L s M , plusLsM, z e r o L s M , l i f t L s M , m e m o i z e R e c , parsest, memoizehr, parsest-

u r i i t L s M :: M o n a d m => a - > L i s t M m a u n i t L s M x = l i f t L s M (unit x)

b i n d L s M : : M o n a d m => L i s t M rn a -> (a - > L i s t M m b) -> L i s t M m b

x ' b i n d L s M ' k = x 'bind' \XI - > fol& p l u s L s M z e r o L s M (map k x l )

l i f t L s M :: M o n a d rn => m a -> ListM m a l i f t L s M x = x 'bhd' \xl - >

un i t [xi]

instance M o n a d rn => M o n a d (ListM m) where unit = u n i t L s M bind = bindLsM

instance Monad m => M o n a d P l u s Z e x o ( L i s t M m) where zero = z e r o L s M plus = p l u s L s M

z e r o L s M : : M o n a d m => L i s t 2 4 m a z e r o L s M = uni t 11

p l u s ~ s ~ :: Monad m => L i s t M m a -> L i s t M m a -> ListM rn a

(X ' ~ ~ U S L S M ' y ) = x 'bhd' \xi -> y 'bind' \x2 - > u n i t ( x l 'merge-res' x2)

- - PARAMETRIZED INPUT MONAD type I n p M m s a = s -> m (a, s)

in unitInpM, b i n W p M , l i f t S n p M , z e r o I n p M , p l w I n p M , termEInt, parseEInt, memoizeEv, parseStEv

u n i t I n p M :: Monad m => a -> InpM m s a u n i t i n p l 4 a = 1 i f t I n p M ( u n i t a)

b i n d I n p M : : M o n a d m 1 n p M m s a -> (a -> I n p M m s b ) -> I ~ P M s b

(a ' b i n d I n p M ' k) inp = a inp 'bind' \(va, outa) -> k va outa

liftInpM :: Monad m => m a -> InpM m s a liftInpM x inp = x .bind' \xi

unit (xi, inp)

zeroInpM :: MonadPlusZero m => InpM m s a zeroInpM = \inp - > zero

plus InpM : : MonadPlusZero m => InpM m s a -:, InpM m s a -> InpM m s a

(X *plusInpM~ y) = \inp -> x inp 'plus' y inp

instance Monad m => Monad (InpM rn s) where unit = unitfnpM bind = bindInpM

instance MonadPlusZero m => MonadPlwZero (InpM m s) where

zero = zeroInpM plus = plwInpM

- - Recognizers that are applied to a position and the - - whole list of tokens

termInt :: MonadPlusZero rn => Char -> Int -> StRM rn String Int

ternilnt c x s 1 (xc i ) 1 ( (x>length S) 1 1 (s! ! (x-1) /= C ) = zero 1 otherwise = unit (x+l)

parseInt :: (Int -> StRM m String Int) - > Int -> String -> m Int

parseInt x iap s = x inp s

instance MonadPlusZero (StRM m String) => Recognizer (StRM m string) Int where

termR = termint

memoizeRec ::

Recagnizer (St- (ListM ( S t (State [ In t ] ) ) ) String) Int => String

-> ( Int -> StRM (ListM (St (State [Int] ) ) String Int) -> (Int -> StRM (ListM (St ( S t a t e [Int]))) String Int)

rr.emoizeRec name f i s = lookupst i narne 'bindSt' \xl - >

i f xi /= [] then unitSt (xl! ! O 1 else f i s 'bindSt' \x2 - >

updatest i name x2 'bindSt' \ O - > unitSt x2

parsest : :

(Int - > StRM (ListM (St (State [ I n t ] ) ) ) Stririg Int) -> Int -> String -> ( [Intl , State [Intl

parsest x inp s = x inp s (newSt (length s))

instance Recdgnizer (StRM (L i s tM ( S t (State [Int] 1 ) 1 String) Int

=> MRecognizer ( S t R M (ListM ( S t (State [ïnt] ) ) ) s t r ing) Int

where memoizeR = memoizeRec

- - non-deterministic recognizer rl : : Recognizer (StRM [] String) In t

=> Int -> StRM [] String Int rl = (te- ' a 1 ' t h e ' (rl 'thenR' rl)

'orelseR' emptyR

- - determirristic recognizer r 2 : : Recognizer (StRM Ex s t r ing) In t

=> Int -> StRM Ex String Xnt r 2 = (te- 'a8 'th-' ( r2 'th-' r2) )

'orelseR' emptyR

-- rnemoized recognizer r3 : : Recognizer

(StRM (ListM ( S t (State [ Int ] ) 1 ) String) In t => In t -> StRM (L i s tM ( S t (State [ I n t l ) ) ) String Int

r3 = rnemoizeR "r3" ( ( t e e ' a 1 ' t h e ' ( r3 'thenR' r3)) 'orelseR' emptyR)

-- ? parsexnt rl 1 "aaaaan - - Ci, 2 , 3 , 4 , 5 , 61 -- (10448 reductions, 16026 cel is ) - - ? parseInt r 2 1 "aaaaan -- Ok 6 -- (569 reductions, 813 cells)

- - Recognizers that are applied to a list of tokens and -- return a list of tokens yet to be recognized

termChar :: MonadPlusZero m => Char -> String -> m String

termChar c inp = case inp of 1 1 -> zero (X:XS) -> if x == c

then unit xs else zero

parseChar :: (String -> m String) - > String - > rn String parseChar x s = x s

instance MonadPlusZero m => Recognizer m String where t e e = termChar

-- EXAMPLES -- non-deterministic recognizer r4 :: Recognizer [] String => String -> [String] r4 = (te- la1 'thenR' (r4 'thenRg r4)

'orel$&' enrptyR

-- deterministic recogaizer r5 :: Recognizer Ex String => String -> Ex string r5 = (te- ' a t 'the' (rS 'thenFt' r5)

'orelseIl' emptyR

-- ? parseChar r4 "aaaaan -- [ I l "aanf "aaan, "aaaan, "aaaaan] -- (4586 reductions, 7672 cells) -- ? parseChar r5 "aaaaan -- Ok [3 -- (143 reductions, 186 cells)

Appmdix B: Implancnwion of Mo& Language Pmcasorr Using Type Gpnnicror Ciasses

1 otherwise eval 'a' = 1

= unit (eval c, x+l)

parseEInt : : StRM (InpM m Tnt) String Int -> Int -> m (Int, Int)

parseEInt x s inp = x s inp

instance MonadPlusZero (StRM (InpM m Int) => Parser (StRM (InpM m Int) String)

term = termEInt

memoizeEv ::

Parser

- z String

String) fnt where

(StRM ( InpM (ListM (St (State [ (Int, Int 1 1 1 ) Int 1 => String ->StRM (InpM (ListM (St (State [(Int,Int)] 1 ) ) Int ->StRM (InpM (ListM (St (State ( (Int, Intf ] ) ) ) Int memoizehr name f s i

= lookupst i name 'bindçt' \xl -> if xi /= 11

then unitSt (xl! ! O ) else f s i 'bindSt' \x2 ->

String) Int

1 String Int 1 String Int

parsest- : : StRM (InpM (ListM (St (State [ (~nt, Int) 1 ) 1 Int) String Int

-> String -> Int -> ([(Int,fnt)], State [(Int,Int)l) parseStEv x s inp = x s inp (newSt (length s)

instance Parser (StRM (InpM (ListM (St (State [ (Int, Int) 1 ) ) ) Intl String) Int

=> MParser (StRM (InpM (ListM (St (State [(Int,Int)l))) Int) String) Int

where memoize = memoizeEv

- - EXAMPLES - - non-deterministic evaluator el : : Parser (StRM (InpM II Int) String) Int

=> StRM (InpM [1 Int) String Int el = (term 'a1 'bind' \xl ->

el 'bind' \x2 - > el 'bind' \x3 -> unit ( x l + x2 + x3) ) 'orelse' f=mptY 0

-- deterministic evaluator e2 : : Parser (StRM (InpM Ex fnt) String) Int

=> StRM (XnpM Ex Int) String Int e2 = (tem l a t 'bind' \xi ->

e2 'bind' \x2 -> e2 'bind' \x3 ->

unit (xl + x2 + x3)) 'orelse' ewty 0

-- memoized evaluator e3 : : MParser (StRM (InpM ( L i s t M (St (State [ ( In t , Int) ] 1 ) Int) String) I n t => StRM (InpM (ListM (St (State [ (Int, Int ) 1 ) 1 1 I n t ) e3 = memoize "e3" ((term la ' 'bind- \xi ->

e3 'bind' \x2 - > e3 'bind' \x3 -> unit (xl + x2 + x3) 1 'orelse ' empty O )

String Int

VZTA AUCTORIS

Barbara Szydlowski was bom on Jmuary 2, 1964, in Lublin, Poland.

After receiving her high school diploma from the Heunan Jan Zamoyski

Gymnasium in 1982, she began her study at the Marie Curie-Sklodowska

University in Lublin. She graduated with Mister's Degree in Mathematia in

1987. Currendy she is a candidate for Mimer's Degree in Science at the

University of Windsor.

r i v i n v L L V A L V A I I W I Y

TEST TARGET (QA-3)

APPLIED I W G E . lnc - = 1653 East Main Street - -. - - Rochester. NY 14609 USA -- -- - - Phone: i l 6/48SQXMl -- -- - - FU: 71W288-5989

0 1993. Appiied Image. tnc.. All Rights Resenred

· ABSTRACT One approach ro implemenring panen in a purely funaional programming languge is to...

Documents

Transcript of · ABSTRACT One approach ro implemenring panen in a purely funaional programming languge is to...