· ABSTRACT One approach ro implemenring panen in a purely funaional programming languge is to...
Transcript of · ABSTRACT One approach ro implemenring panen in a purely funaional programming languge is to...
NOTE TO USERS
The original manuscript received by UMI contains indistinct, slanted and or light print. All efforts were made to acquire the highest quality manuscript from the author or school.
Microfilmed as received.
This reproduction is the best copy available
COMPLEXITY ANALYSIS AND MONADIC SPECIFICATION OF
MEMOIZED FUNCTIONAL PARSERS
by
Barbara Szydlowski
A Thesis Submitted to the Faculty of Gnduw Studies and Research
through the School of Computer Science in Partial Fdfihent of the Reqirements for the Degree of
Maser of Science at the University of Windsor
Winbor, Ontario, Canada 1996
Naüonal Library Bibliothéque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services senrices bibliographiques
395 welrigton street 395, ~e we~~uigtorr OttawaON K 1 A W OttawaON K1A ON4 Canada CaMda
The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distriibute or seIl reproduire, prêter' distribuer ou copies of this thesis in microform, vendre des copies de cette îhèse sous paper or electronic formats. la forme de microfiche/lnlm, de
reproduction sur papier ou sur format électronique.
The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.
Barbara Szydlowski 1996 @ AU Rights Reserved
ABSTRACT
One approach ro implemenring panen in a purely funaional
programming languge is to mode1 them as funaions, and ro define a set of
higherorder combinaton that d o w one to build larger paners out of
smaller components. These combkors implement grammar constructions
such as alternation or sequencing. They can be used to constnict parsen with
stnichues resembling the BNF notation of the grammars of the languages
being p r o d . Such parsers are modular and easy to modify and
undersund A major disadvantage of this approach is that the resulting
pvsen use topdown fully badrtracking strategy that may led to enormous
Ume and space requirements.
The effiaency of parsen can be improved by adding bookkeeping
features that eliminate unnemsvy backtracking. In this thesis we investigare
a technique cded memoization. A memoUed paner cornputes irs r d t
based on previously computed r d t s that have been stored in a memo-table.
A paner is a program that determines the syntactic structure of an input
~ e ~ u e n c e of symbols in some language. It may produce some kind of abanct
syntvr tree as output. We consider the simplest type of paners - language
rezognizers thu can be thought of as progruns detennining only if the input
seQuence belongs to a given language. We show that mernoid recog-n
constructed for an arbitrary grammar have 0(n3) M e complexity where n is
the length of the input to be p r d The space r e q d to store the
memo table is (at most) 0(n3). In purely funaional progrunming 1anguage.s
that support updateable in-place variables the space requirements could be
reduced to 0(n2).
Monads, whidi are abntacr structures from Category Theory, have
proven usefil for addressing many cornputational problerns in purely
funaionai programming. The monadic approach allows one to build basic
parsers and combhators out of components that represent various
programming Ianguage features such as state, exceptions, or non-
determinlm. These features automatidy becorne the chacacterisucs of the
r d t i n g paner. We show how munoGed recognizers could be implemented
in a fully modular way using the monadic approach. We ais0 describe how
the technique could be extended to improve effiaency of more complex
language processon, nich as syntax-directed evduaton.
ACKNO WLEDGMENTS
The author wishes to express sincere appreciation to Dr. Richard Froa
for helpM discussions, support, and his assistance in the preparation of this
manuscript. In addition, special th& to Dr. Yung Tsin and Dr. Richard
Caron for their constructive comments on the fina drafi of this thesis.
TABLE OF CONTENTS
ABSTRACT
TABLE OF CON'I'ENTS
LIST OF FIGURES
INTRODUCTION
1 CONSTRUCTING PURELY FUNCTiONAL LANGUAGE PROCESSORS
2 THE TiPE OF PARSERS
3 BNF NOTA- AND FUNCTIONAL LANGUAGE PROCESSORS
3.1 &rcK p a m
3.2 Combimms
4 NON-DETERMINISIIC LANGUAGE PROCESSORS
7 MONADS AND FWNCTIONAL PROGRAMMNG
8 MONADIC CONSTRUCTION OF LANGUAGE PROCESSORS
vii
INTRODUCTION TO CATEGORY THEORY AND
MONADS
MONADIC CONSTRUCTION OF PURELY
FUNCTIONAL RECOGNIZERS
2 TYPE OF THE MONADIC RECOGNCLERS
3 BASIC MONADS
3.1 Ihe idartity monad
3.2 Exceptions
3.3 Norr-detmininn
4 COMBIMNG MONADS
MONADIC CONSTRUCTION OF MEMOIZED
LANGUAGE PROCESSORS USING TYPE
CONSTRUCTOR CLASSES IN GOFER
1 THE SYSTEM OF TYPE CONSTRUCTOR CLASSES IN ~ F E R
2 MONADS AND TYPE CONSTRUmOR CIASSES
4.1 nestate monad
4.2 The parametrized list m a a d
4.3 n e rnemo-table
5 PARSERS
5.1 n e type of tbe m d i c p a m
5.2 The c h Paner
1.3 The pararnetrized input m d
S. 4 Memoized paners
6 C0MPLEXnr OF MEMOEED LANGUAGE PROCESSOM
CHAPTER 6
APPENDIX A
MEMOEING PURELY-FUNCTIONAL TOP-DOWN
BACKTRACEUNG W G U A G E PROCESSORS
1 CONSTRUCITNG MODULAR NON-DETERMlMSnC LANGUAGE
PROCESSORS IN FUNCTIONAL PROGlUhdMlNG LANGUAGES
2 M E M O ~ G LANGUAGE PROCESSORS
3 MEMoIZATXON IN PURELY-FUNCTIONAL LANGUAGES
4 MEMoEING PUPY-FUNCTIONAL RECOGNtZERS
4.1 ?;be rnemo-table
4.2 The memoized recognizm
4.3 lhe Aigontbm
5 CoMPLExrrY ANALYSIS
5.1 Ehentary ~ u r i o n s
5.2 B e site of the memo-table
5.3 Mmetable lookup and @
5.4 recognizers
5.5 Aitwnatim
5 6 Sequmcing
5.7 Meging 7& r e t u d when a recognker is a&isd 80 a lin of s*rrt
positim 96
5.8 ne txmtim tree 97
5.9 Proof of qd) time compkxty 98
6 A MONADIC APPROACH TO INCORPORATE MEMOIZAnON 101
6.1 Manadr 102
6.2 Non-memoîzed m& rerognizers 104
xi
63 Memoized m d i c mognizers 106
7 M E M O ~ G PARSERS AND SYNTAX-DIRE= EVALUATORS 109
DMPIEMENTATION OF MONADIC M G U A G E
PROCESSORS USING TYPE CONSTRUCTOR CLASSES
IN GOFER 115
VITA AUCTORIS 125
LIST OF FIGURES
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Basic parsen 4
Basic recogn;Zers
Example definitions of orelse and then
Example diagram
Funcror F mapping category C into D
Natural transformation mapping funaor F into functor G 20
Monad laws 30
List monad 31
Kleisli triple laws 32
Kleisli triple for 1ist.s 33
The type of monadic recogaUers 37
The idenUty monad 38
The exception monad wirh zero and p l u
The List monad with zero and pius
The dus Recopn;Zer
The state monad
The ~vametrlled lk monad
The class Paner
The parametrized input monad
Figure 22 A functional progrun containing a definicion of the
Figure 23 The reiationship h e e n the gnmmar and the prognm
implementing the recognizer
Figure 24 Definition of the Fibonacci funcrion
Figure 25 A memoized version of the Fibonacci funcrion
Figure 26 The relationship b e n a recognizer and iû memoized
version
C h a p r e r I
INTRODUCTION
1 Constnicting purely functional language processors
One approach to implementing language processors in a purely
functional prognmming laquage is to mode1 them as fuoctions, and to
define a set of higher-order combinaton that allow one to build lvger
parsen out of s m d u componenu. This approd dates back to Burge's
book on renirsive programming techniques [BR75], and it has been
popdarized in hinaional programming by Wader NA851, Frost m92],
Hutton w 9 2 ] , and others. Accordkg to the approach, a parser is a
program that taka a string of tokens as input and yields some kind of an
absuui t r e , that describes the grammatical structure of the srring, as result.
Owing to the fact h t a parser might not consume dl of the input
string, it is convenient to represent parsen as h a i o n s that when applied to
the input string of tokens return a pak d u e (an abstract tree) and the
unconsumed part of the input. Furthemore, a parser might fail on its input.
One way of distinpuishing b e e n niurss and failure is to have parsers
muni a lin of pairs rather t h a single pair, with the convention that a
singleton List denotes sucus and an empty iist denotes f h e wA85].
We sur by definiog the type for parsers and recognizen. Next, we
define basic parsea and combinaton. The implemenration language for this
thesis is Gofer [TN%], a purely funaional programming language that
nippons many useful features, such as lambda expressions and type
constmctor classes IJNA931. These features will be dixussed larer in this
thesis.
2 The type of parsers
Using the conventions dexribed in the previous section we defme Our
parsers to be of the type:
type Parser a = String - > [(a, ~tring)l.
That is, a parser of the type Parser a is a funaion from the input string of
characten to the list of pairs (value-of-type-a, rest-of-the-string).
A recognizer is a language processor that simply determines whether
or not die input string of c h a ~ e n belone to a defined language. Using the
svne pr-ples as for parsers, we define our recognUers to be functions from
the input string of charaaers to the lin containing an unconsumeci part of
the input* A singleton lisc of results denotes sums; an empty lin of results
denotes failure,
type Recognizer = String -> [string]
Throughout the remainder of this thesis we will also introduce the
definitions of parsers and rewgnizcrs thu are of slightly different types than
those given above. For example, instead of applying language processors to
the string of tokens yet to be p r o d (with the wumption that the fim
character ro be processed is the first character of the input string), we can
apply each of them to the whole input a ~ g and a single stuc position (the
position specifies the fim character to be processed). One advantage of thk
representation is that the input string remains undiuiged through the whole
process of parsing or recognition, and there is no need for each processor to
r e m an uncollsumed part of it. Instead, each paner or recognizer renvns a
aur position for the nmc processor. The output of processon is more
compact md therefore more suitable for storing (for example in a memo-
table). The corresponding modified types of parsers and recognizers are
given below.
type ParserZnt a = String -> In t -> [(a, In t ) ] type RecognizerInt = String - > Int -> [Int]
3 BNF notation and hnctional language processon
In BNF noration gram ma^ are constructed by defining a set of
terminals and a set of productions. The symbol E denotes an empty
production, the symbol 1 denores aluniauon, and juxtaposition denotes
sequencing. More cornplex productions cui be built from simpler ones by
combining rhun using altemation or sequencing.
Corresponding language processors can be consuucted by d e k g the
functiom empty and term that correspond to an empty production and a
terminal, and the higher-order functions orelse and then rhu correspond to
altemation and sequencing in BNF respectively. Similady as ci BNF, luger
parsen can be b d t from smaller componentj by combining them using the
alternation or sequencing (higherorder) operators. The structure of the
rd t ing panen closely resembles the structure of the underlying grammars.
3.1 Basic parsers
In this section we define three basic p v x n and recognizcn that can be
used as building bl& for more cornplex luigruge processon. The paner
emptyP v dways s u d without conniming any of the input string, and it
retums a value V. The pamr failP always fa&, regardless of its input. The
paner termP c processes a single duraccer at the of the input
string. It fails if the Gm character to be processed k not C, or if the input k
an empy string. Example definitions are given below.
type Parser a = String -> [(a, String)]
failP : : Parser a f a i l P inp = II
te- : : C h a r -> Parser Char te* c inp = case inp of
L I -> failP inp (x:xa) -> if x == c then [(c, xs)l
else failP inp
Figure 1 Basic parsers
Similady, one can define the corresponding recognizen. The
recognizer emptyR dways succeeds rrniming its input unchangeci. The
recognizer failR always fails. T h e recognizer termR c connimes a single
character c at the beginning of the input string or fails if the fim character is
not C, or if the input is an empry string.
type Recognizer = String -> [String]
emptyR :: Recognizer emptyR inp = [inpl
f a i l R : : Recognizer f a i l R inp = El
te- :: Char -> Recognizer termR c inp = case inp of
[1 -> failR inp (X:XS) -> if x == c then [xsl
else f a i l R inp
The following illustrates the use of the above &finitions.
It is not d i f f id t to modify the &ove definitions so that the language
processon built with them accept as parameters a single start position and
the wwhe inpu1 suing. However, rather than rewriting d the definitions,
let us consider the relarionship between the types of the tnro corresponding
procesors. A parser thar is applied to an unconsumed part of the input
string is of the type:
type Parser a = String -> [(a, String)].
The corresponding parser that is applied to a string and a single start position
is of the type:
type ParsexInt a = String -> Int -> [(a, 1nt)l.
We can gene& the type of the first parser by abstr;iccing over the
type of the strings (considering it as an additional parameter). The type of
the fine parser could be written as:
type Parser a b = b -> [(a, bll.
In a similar way we can parametrize the type of the second p a r by
absracting over the acnul representation of a 'position"
type ParserInt a b = String -> b -=r I (a, b) 1 .
We oui now write the definition of the type Parserh in terms of the type
Parser
type ParserInt a b = String -> Parser a b.
kter in this thesis we shall see the advantages of this approach. The
basic parsers and paner combinators of the type ParserInt no longer need to
be dehed explicitly (excep for the definition of terni which depends on the
actual representation of the input). They vise as special instances of K i n g
operations of rhe type Paner to the operations of the type ParserInt. The
same applies ro the definirions of CO rresponding recognizen.
3.2 Combinators
Basic parsers cm be combined uing the operaton orelse and then to
form more complex parsen. The operator then corresponds ro sequencing
in BNF. It applies the second parser to the result retumed by the finr one.
The operator orelse corresponds to alternation in BNF. It applia two
parsen to the same input and concatenates their results. Example dehnitions
of these operators for the recognizea are given below. The notation p
'orelse' q is equivdent to orelse p q; single quotes are used in Gofer ro
denote an infix operator.
1
thenR :: Recognizer - > Recognizer -> Recogriizer (p 'thenR' q) inp
1 r p / = 11 = q ( r p ! ! O ) 1 otherwise P 1) where rp = p inp
orelseR :: Recognizer -> Recognizer -> Recognizer (p 'oxelseR' q) inp = p inp ++ q inp *
The recognizes (p 'thenR' q) fa& if the recognizer p fa&. Ocherwise,
the recognizer q is applied ro the finr element of the lia renirned by p. The
expression (rp!!O) denotes the ffim element of the lisc rp (the element at index
0); the symbol '/=' is uxd in Gofer to represent the %or e q d " operator.
Using the definitions above we can consuuct recognizen with structures
closely resembling the nmctwes of the underlying grammars.
a-then-b :: Recognizer a-then-b = t e e 'a1 ' t h e ' te- 'b'
a-or-b :: Recognizer a-or-b = te- 'a1 'orelseR' termR 'b9
? a-then-b *abcu ['Ic" 1 ? a-or-b llabcw [nbcml
4 Nondeterministic Ianguage processors
Representing languige proasson as funcrions that retum a list of
r d t s has one advantage: it is relatively easy to rnodify the processors so
rhat they can return more than one result. One approach to modifying rhe
definitions of the basic proassors and combinaton so that they can be used
to build nonderefministic parsers is to fim rn- their types, so that they
accept a lis of inputs as parameter and renirn the he of outputs as their
results. For example the type of the recognizen could be defined as
type RecognizerAmb = [String] - > [String].
Our fint implementation of m e m o d recognizers in the purely
h c t i o n d 1anguage Miranda1 v 9 0 ] is based on this approach FS96] (a
copy is given in Appendix A). This approach has m a i n disadvantages: 1)
most of the def~t ions of basic parsers and cornbinaton must be modified, 2)
the new definitions are often more cornplex and difficult to understand &an
the definitions of corresponding nondetefministic processors
A better approach is to am with recognizers that are of the type
String -> String (they either s u d or remm an internai error) and thLik
about unbiguous recognizers as functions that involve an additional effecc of
nondeterminism. The nondetenninistic recognizers are of the type Stnng
[String] where the resulting l i s can have any number of elements (an
empry Iist denotes failure). The definitions of the initial recognizers are
almost identical as thox presenred earlier in this chapter; the definitions of
the corresponding nondeteminktic recognters arise automatically as
special instances of lifting operations dut renim a single r d into
operations that retum a lis of results.
Nondetemiininic language processors cm use backtracking and
renvn multiple results. This ability, however, cornes at a price. If an
underlying grammar is ambiguous, the corresponding functional processor
may have exponential space and Erne complexity. The main reason of su&
complexity is the repetition of the svne computations during backtracking.
Mernokation is a dynamic prognmming rnethod which allows one to
avoid performing the same computation more rhan once. A memoized
funaional language procewr is a function that takes an additionai parameter
- a memo-table containing all previously computed results. If the input has
been processed before, the processor simply r m m s the corresponding result
from the memo-table. If the input has not been pro& yet, the new r d t
is calculated and then the rnemo-table is updated.
One approach to implementing memoized recogn;Zers is to slightly
modify the definitions of basic recognizers and combinaton so that the
processon bu& wirh them accept a memo-table as part of their input and
renvn a m e m ~ a b l e as part of th& output. Na, the higher order funnion
memoize is applied to each recognizer to store its r d t in the memo-table.
This approach is described in detail in Appendix A.
Section 6 of Appendix A presents a slighdy different approach ro
implementing memoization. If we consider our initial recognizers to be
funcrions diat retum a List of vdues, then memoized recognizen cui be
represented as funccion that, when appiied to a memcxable, renvn a lisc of
values paired with the modified memcdde. One advantage of this approach
is that there is no need to define memoized versions of basic recognizen and
combinators. They arise automatically as special instances of Iifting
cornpurations of the type [a] inro computations of the type State -> ([a],
State).
We have already niggested earlier that exploring the relationship
between the types of two programs may help us to avoid unnecessary
rewriting of one program imo the other. Modifymg a program by rewriting
its components is time conniming. Furthemore, the correctness of the
initial progam does not no guvantee the correaness of its modifieci version.
The monadic approach allows one to transfomi one program into the other
in nifh a way that cenain properties of the initial program are preserved
The basic ideas of this approadi corne from Category Theory and were
introduced to Computing Saence by Eugenio Moggi w089, M090].
6 Monads
The notion of monads cornes from Category Theory w1, BA90,
PI911. Informally speakmg, a monad over a category is an absrract suucnire
that d o w s one to reason about the objects of the category in tenns of 'how
rhese objeas interrelate" W I ] . In Computing Science a category of
interest is a category C of types and programs, and the relationship in muld,
is a function on types in C w089, M0901.
In order to define this function, one should distinguish the type of
values a program produces from the type of the prognm i d . For example,
a. pro- that is "effect freen does nodiing but remm a value of some type
a. Therefore,
the explicitly
cornpuration.
the type of such a program is always identical to the type of
retltrned value. Consider a non4eterminist.i~ (or ambiguous)
This computation returns a set of possible resuits. Such a set
could be represented, for example, as a lis (that is the result could be of the
type [aD. Using the same principles, a c o m p u ~ o n that handles exceptions
could be of (ui algebnic) type Raise String 1 Return a. In each of these
cases the type of the computation cm be defined as a funaion of the type of
the value the computation produces.
A monad in a category of types and programs is a triple: a function on 4
types T (describecl above) that defines the type of program, cogether with
w o operations (that can be interpreted as composition and identity) that
d o w the combination of nich progams. The type consruaor T abstracts
over the 'eff«tm the program incorporates. The type of the two operations is
defined in terms of the type consrnuxor T. Having defined a monad, one c m
write a program as a set of components of the &tract type T a and use the
taro opentions ro combine them. In other words, the structure of the
resulting program does not depend on the 'effed the prognm incorporates.
7 Monads and functional p r o g r d g
Monads abstrace over the kind of an "effect" that is added to a
progrun. This idea inspired Wadler WASO] to introdua monads as a tool
for amcturing purely funaionai prognms. Programs w h e n in purely
functional programming languages such as Haskell [PT96], Gofer [JN94], or
Miranda [rvsO] are somehes very dif&cult to moddy if one wants to add
the 'effectsw such as state, interactive VO, or just to p ~ t some error
messages. Wadler noticed that monads can be used to easily incorponte Ndi
effects.
One advantage of the 'monadicW approach is that the "effeasW are not
SrisibleW in most of the function defînitions of the program. The kind of the
effms the program includes a n be derermined by examining the definition
of the monad. In order to add a new "effecr" one simply h u to change the
definition of the rnonad and make some additional, unidy trivial, locai
changes.
8 Monadic construcîion of language processors
It has been noteci by Wadler mA90, WA92] that the monadic
approach allows one to build language processon that are more modular and
easier ro rn+ chan traditional ones. A parser can be thought of as a
program that deah with interactive input. When applied to the input string
it returns a pair: value and the unconnimed part of the input. A non-
detenninirric parser can be thought of as a progam that incorporates two
'effects*: interactive input and non-detenninism. Sknilarly, a deterministic
paner is a program that combines interactive input and the ability to fail
(that is, either it returm a value V, or it fa& producing no value - this can be
capnueci by the type commaor: Ok v 1 Fail).
Defining a monad that represents a single 'effed is not difficult. It is
&O usually possible (using an ad-hoc approach) to define a "combinedm
monad that reprrsents a composition of feanues. However, *how to
combine arbitrary monads?" wu a long-standing question and a topic of
research in the uea of s d e d 'monadic funaional programmingn.
Attempts at hding a genenl tedinique for composing two arbitrary
m o d , were made by King and Wadler [KN92], Cenciarelli and Moggi
(CE931, Jones and Duponcheel IJNB931, Steele [ST94], and Espinosa D95J
and endcd with partial successes yielding techniques that were not general.
More recently, Liang, Hudak, and Jones have proposed a new method, based
on the theory of monad transfomen w95], that allows one to compose
monads in a fully rnodular way. One application of the technique was the
constmction of modular progrvnming language interpreters.
Using the above technique we have impIemented different types of
language processon. Our processors are f d y modular, they are built up
f'rom components that represent various prognmming language features,
such as nondeterminism, exceptions (used to represent determinism),
interactive input (parsers), and state (memoiution). This approach dowed
us to eady extend the same technique we have initially used to memoize
functional recognizers, to improve complexity of other language procwon
(such as, for example, syntaxdkected evaluaton). The details of this
implementation can be found in Chapter 5.
9 Organization of this thesis
The remainder of this thesis is orpueci as follows.
Chapter 2 gives a brief introduction to Category Theory. We
introcluce basic definitions and explain category-theoretic nouons of monab
and Kleisli triples.
Chapter 3 dexnbes how Category Theory mon& and Kleisli triples
are represenwl in purely functiod prognmming. We start this presentation
with a short description of the category-theoretic semantics of computations
proposed by Eugenio Moggi.
Chapter 4 presents how purely funaional recognizers can be 4
connnscted using the monadic approadi. We begin by discussing monadic
recognizen that incorporate a single "effect". Nexr, we describe how monads
can be combined to yield recognizer~ that involve a combination of different
Chapter 5 describes the d d of implementauon of memoized
language processon using type comtructor classes in Gofer. We stvt by
discussing the implementation of memoizcd recognizen. Next, we show
how easily the technique for memouing functiod recopizrs can be
extended to more complex language promsors.
Chapter 6 conchdes summarizing the main advamages of using the
monadic approach to constnia purely functional language procesors.
Appendix A contains a copy of the paper nimmuiùng our early
efforts to implement mernohion in the purely bctiond progrvnming
Miranda. This paper indudes a M e d description of the memoization
algorithm together with its forrnal complexity anaiysis.
Appendix B contains the Gofer soum code that implements
memoivd monadic language processon using type consvuaor classes.
C h a p t e r 2
INTRODUCTION TO CATEGORY THEORY AND MONADS
1 Introduction
In Category Theory mathematical concepts are representd using
abstract diagram. Such diagnms consist of verrices representing objects in a
caregory, and directed edges (arrows) representing the mappings berween
these objecu. A diagram is called commutative if for each pair of vercices X
and Y, any cwo parhs fozmed from directed edges leading from X to Y yield,
by composition of the corresponding mappings, equd mappings from X to
Y. A diagram
can be used to represent the category of sets and functions. If f, g, and h are
functions such that f : X + Y, g : X -t 2, h : Z + Y, then the above
diagram is commutative if f = h O g, where O denotes usual composition of
functions. The same diagram may be wd in many other contexts where X,
Y,-and Z represent objects, f, g, and h represent mappings bemeen them, and
the operation O defines how two mappings can be composed.
Category Theory offen an abstract view of mathematical concepts.
The concepts are abstracted from the context in which they were made
precise, and therefore they can be insranriateci into other contmrs that were
not considered before. This section gives a bief overview of basic concepts
in Category Theory and presents a category-theoretic introduction to
monads. The presentation here is baseci on the basic texrbooks on Category
b r y w1, BA90, PI911.
A utegory is a collection of objects, a collection of arrows (&O called
morphisms), together with taro opeations:
identiry, that assigns ro each object A an arrow IdA (the arrowpointing
fiom the object A to itself),
composition, that assigns to each pair of arrows f : A + B, g : 8 -+ C an
vrow g O f : A + C called their composite.
The composition of morphisms musc obey the associative law, that is for any
morphisms
the condition
h O (g O f) = (h O g) O f musr be satisfied.
Another requiremenr of a caregory is that the Id funaion is an identity for
the composition. That k, for any morphism f : A + B, the following #
condition must be satisfied
The categories of h t e r ~ from the functional prognmming point of
view are those when objects are types and morphisrns are prognms.
Identity is an identity program. Composition is the way of combining rwo
P'OgramS-
3 Functors
C . - C
AfUncîor IS a rnorphism ot ategones. Given m o categories C and D, a
functor F : C -t D is a pair of functions:
4 the objecr funaion that maps each objea A of the category C into the
corresponding objea F(A) of the category D,
4 the arrow function that w i p to each arrow f in C the corresponding
MOW F(9 in D.
Each funaor is required to preserve the identity and the structure of the
composition of morphisms, that is for any raro morphisms f and g and the
i&ntity morphism Id, in C, the folIowing conditions mus be sausfieci:
The gaphical reprexntation of a functor k given on Figure 5
In category C
F i e 5 Functor F mapping category C into D
An mdofrcnctoor is a h a o r from a category to itself. It maps objecu of
a category C, and mappings between hem, into corresponding objects and
0 a type constructor on l.ists that maps a type a into the corresponding
rype [al
type List a = [a],
a standard library funcrion rnap that cm be thought of as a mapping of a
program from a to b into a program from [a] to [b]
W P . . . . (a - > b) - > [al -> [bl
map f [I = [1
map f (x:xs) = f X : map f xs.
4 Naturai transformations
Given two functon F, G : C + D that are the mappings berween the
svne categorks, a nuturd tran$ontzcttim y from F to G is a mapping that
as+ to each objm c of C an arrow yc : F c + G C. In pictorial
representation the nanial transformation y can be thought of ~ I S a way of
"slidingn the diagram defining the hinaor F onto the diagram that &&es
the functor G, such thar all pdelograms ( k e those shown on Figure 6) are
commutative.
In category C In category D
Figure 6 Naturai transformation q i n g funaor F into funaor G
Naturd transformations are families of arrows. If F and G are two
in C) we can think of a natucd tansformation as of a polporphic funaion
of the& F a -> G a [ES95]. It is noc difficult to find examples of naturai
transformations in purely funaional prognmming laquages. For instance a
polymorphic h a i o n list that takes as argument an element of uiy type a
and r e m a singleton list of the same type is a natural transformation.
type F a = a
type G a = [al
list :: F a -> G a
list x = [XI
5 Functor categories
Nanid transformations can be c o m p o d The composition of two
n a d transformations is a natural transformation. It is &O associative and
for each functor F there exists an idenuty nanual transformation 1 : F -t F
(a mapping of a funaor into itself). Therefore, given the ~ategones C and D,
we can formally consuuct afuttctur categv 3 that hu functors F : C + D
as its objects and n a d transformations between such functors as its
morphisrns (see (MMI], pA90], or PI9 11 for dexads of this connniction).
Owtig to the fact that it is very convenient to abstract over the objecrs
of a category and to reyon about the caregory only in terms of hinctors and
nanual transformations, functor categories are extensively used in Category
Theory. An example of a h a o r category is a monad - a h a o r category
with one object. Mon& are deKnbed in the next section.
6 Monads
In Caregory Theory, a r n o d over a category C is a triple (T, q, p),
where T : C + C is an endohuictor (a functor mapping to and from the
same category), and q and p are two naturd transformations defmed as
follows
For the triple (T, q, p) to be a m o d , the three laws called the arsocirtive
macul &w, and the leji and nght idmtiq laws musr hold:
P ~ V * P ) = P ~ ( P ~ ~ - associative law
CI (q O T) n Id3 = p O (TO q) - leftand right unl law.
These laws (if satisfied) guarante that the triple forms a functor category
over the category C.
7 Bleisli triples
K k I i ~~ are alternative d escriprions of mon .ads and there is a one-
to-one correspondence h e e n the two ( s e W I ] for the proof). A Kleisli
triple over a category C is a triple (Tl q. -'), where
T : Obj(C ) + Obj(C ) (T is a function on objeds, not a functor),
q, : A + TA for A E Obj(C ),
fq :TA+TB for f:A+TB,
and the following conditions hold:
O f= ldTAOf
f W 0 q * = f
g œ O ( f œ O h ) = ( g O f ' ) ' " h
(right unit),
(Ieft unit),
(associativity) .
Given a monad (T, q, p) one cui consvuct the correspondhg Kleisli
triple (T, q, -') by restricting the endofunctor T to objects. Convenely,
given a Kleisli triple the corresponding monad can be constructeci by
extending the funaion T to an endohor.
Monads are more widely wd in Category Theory than Kleisli triples.
They have the advantage of being defined only in terms of functors and
natural transformations, which rnakes them more suitable for abstracc
manipulation. Kleisli triples, accordkg to Mo& w 8 9 ] , are easier to
jusufy from a computational perspective.
8 Kleisli Categories
Given a Kleisli triple (T, q, -') over a category C the corresponding
Kleisli category Ç can be defineci as follows:
the objects of Ç are the same as those of C,
if f : A + B is a morphLm in C then f : A + T B is the wrresponding
the composition of two morphisms f : A + T 8 and g : 8 + T C in Cr
is defined as g ' O f,
this composition must be assoüarive with q as its lefi and nght unit.
If an underlying category C is a category of types and prognms, then
given a Kleisli triple (T. q, -.) we can formdy construct a Kleisli category
Ç of types and programs over the category C. The objects in Ç are types
(as in C); the morphisms in Ç are programs from the type a to the type T b,
where the endofundor T is a function on types in C. The expression g ' O f
represents the composition of the two programs: f and g. Mo& caregorical
semantics of compurations, which is d i d in the next chapter, is baseci
on this idea.
C h a p t e r 3
COMPUTATIONS AND MONADS
1 Introduction
Kleisli cwgories were originaliy proposed by Eugenio Moggi as a
conveniem fnmework for struauring the semanun of prognmming
languages [M089, M0901. The prinaple underlying Moggi's work on
monab was the distinction between simple data-valued functions and
h a i o n s that perforrn computations. A data valued hinaion is one that
simply renirns its value (and does norhing else). By conuut, a funaion that
perfomis a computation can encornpas ideas such as exceptions, sute, or
no~determinlrn~ and as a consequence, it cm impliudy produce more
results than the result explicidy retumed.
Wadler WA90] noticed tha Moggi's idcv of using monads to
stniaurr the semantics of computations f i d well into the purely
funaional prognmming environment and proposed mon& as a technique
for strufniring funaional programs. He sbowed that mon& can be used to
express 'imperative features" like updateable state, exceptions, non-
determinism, or VO in pure functiod languages, wMe rer?ining the strong
reasoning principles vaiid for these languages.
This chapter presents a brief overview of Moggi's categorical semantics
of computations based on mon&. It &O dexribes how category-theoretic
monads are represented in purely functionai prognmming Lnguages. The
presentation here owes much to the papers of Moggi w089, M0901,
Wadler wA90, WA92], and Hill and Clarke m94].
2 Categoricd semantics of computations
The basic idea behind Moggi's categorical semantics of cornpurvions is
that, in order to interpret a programming language in a category C, one has
to distinguish the object A of values (of the rype A) from the object TA of
computations (computations that produce a value of the cype A). If T is an
unary operation on objects in C that maps objects of values into
corresponding objects of computauons, then a program from A to B an be
identifieci with a morphism from A (the set of values of the type A) to TB
(the set of computations that produce a vaiue of the type B) in C. In other
words, a program is a function from values to computations.
In category-rheoretk terms T is an object mapping part of an
endofunctor in C; in the context of functiond prognmming, T is a type
constmctor (a function on types). Moggi calls an operator T 'a notion of
computation", since it ab- away from the aaull type of values
computations may produce. Examples of notions of computations that are of
particular interest to funaional programming are as follows:
computations with side effects that denote a mapping from a state to a
pair: value and die modified state
type T a = State -> (a, State), d
0 non-determinhic computations that denote the set of ail possible values
type T a = [al
computations with exceptions that denote either a value or an exception
data T a = Raise String 1 Return a,
interactive input that denotes a function from the input string of tokens
to a pair: the Gm token and the rest of the input
type T a = [al - > (a, [al),
interactive output that denotes a pair: a value and a funaion that maps a
string (the output of the rest of the prognm) into a string (the output of
the whole
type T a = (a, [al -> [al).
Rather than focllsing on a speufic notion of cornputarion, Moggi
proposed Kleisli triples for modehg the notions of cornputauons and Kleisli
cvegories for modeling categories of programs. The components of the
Kleisli triple (T, q, -O) a n be interpreted as follows. The endofimaor T is a
function on types that maps the type of values into the type of
corresponding computations. The n d transformation q appliad to a
value renirns a computation producing this value. The expression g ' O f
where f : A + TB and g : B + TC has the following meaning: fim apply f
to some value of the type A to produce a computlrion of the type T 6, then
evaluate chis computation to obtain a value of the rype B, finally apply g to
this value and renirn a computation of the type T C as a r d . This
expression corresponds to sequencing of m u computations. The expression
q a may be i n t e r p r d as a 'pure" (i.e., effect-kee) computation t h does
nothing but deliven a value. The composition (g ' O f) a represents a
computation that indudes all of the =effectrrn of f followed by applying g ro
the value computed by f.
In a similar vein one can intex-pret monads. Given a Kleisli triple (T, 11,
- ') over a category C the corresponding monad is (T, q, p), where T is a
functor (that is T is a pair of mappings: an object mapping and a morphism
mapping). The n a d transformation q has the sune m&g as for KleisLi
triples; the natunl transformation (which is of the type T (T a) + T a)
an be thought of as a way of 'flattenia% a computation of computations
into a single computation.
By using monads, Moggi dehed the semantics of computations with
'effens" which is independent of the kind of the effect these cornpudons
incorporate. Each effect is simply an instance of the same 'notion of
computation". Note that also 'no effectm (or a pure computation) is nich an
instance. B a d on the categorid semantics of computations Moggi built a
system cded computational Lulculus, that cm be used for proving
equivaience of prognms. The andysis of the systern is beyond the %ope of
this thesis. The detailed description of computational I d c u l u s can be found
in Moggi's papen w089, M0901.
3 Category Theory rnonads in functional programrning
ki funnional prognmmlig, monads are usually presented as a kind of
an abstract dara type. The type definition indudes the definition of the
monad i d and definitions of primitive operations related to the parti&
effect the monad represents (set [WA92] and many other papers). For
example if the e f f m in mind is state, the primitive operations may indude:
new (that creates a new structure representing the state), l00kup (rhat
searches the srare), and update (that updates it). These openrions have
dearly nothing in cornmon with Category Theory and they are in many
cases application dependent. The funaionai definitions of the triples,
however, closely resembles their correspondhg cwgory-theoretic
definitions m94]. This section explores the relationship b e n the two.
Given a Category Theory monad (T, q, p), the functional
programming monad is represented by a quadruple (M. map, unit. join)
PA90], where:
0 M is a type constructor, for example
type M a = Eal,
map is a higher order funaion (dogous to standard map on lists):
map :: (a -> b) - z M a - > M b,
unit represents the nanial transformarion q
unit :: a -> M a,
join represents the n a m d transformation p
join :: M (M a) -> M a.
If a is a type of values then M a represents the type of progrvns that
renirn values of the type a. A pair M and map mo&k a functor: the type
corutmaor M is a mapping on obj- (types), the higherorder function
rnap is a mapping on arrows @rograms). The definitions of the functions
unit and join depend o n the effect the monad represents. The function unit
converts values to corresponding computations. The funcrion join 'flattensn
a computation of computations into a single computation. A good example
of join is the standard librvy h a i o n concat that 'flaftens" a lin of lists
into a single lisu (see example below). The quadruple musr satisfy the laws
eqyivalent to the monad laws given in the previous chapter.
join unl = id = join O map unl (left and right unit) join O map join = join O join (associativity )
Figure 7 Monad iaws
Lists represent nondererministic computations (computations chat
return a set of possible results). The monad for lists mA92] is given below.
The type connntcror M dehes the type of computations. The funaion unit
creares a one element lis. The funaion join &es a list of lias and
concatenates all sublists into a single kit. The function map is a standard
map on lists.
type M a = [al
u n i t :: a - > M a wit = \x -> [XI
j o b :: M (M a) -> M a join = concat
map :: (a -> b) -> M a -> M b map f = \x -> case x of
II -> [1 (x:xs) -> f X : map f xs
An expression of the form \x -> e is d e d a lambdaexpression, and
denotes a hinaion that &es an argument x and returns the value of the
expression e. Therefore, the function unit could equally well be defined as:
unit x = [XI. The definition given in Figure 8, however, is more expressive
and corresponds more closely to the type of u n l
3.2 gleisli triples in functional programs
In Wadler's more recent papers, the use of monads bears closer
resemblance to Kleisli triples. Given a Category Theory Kleisli triple (Tl ql
-*), the correspondhg "hinaional programming triple" is represented by a
triple (M, unit, bind), where:
0 M and unit have the sarne meaning as for monads from the previous
section,
bind is a polymorphic hinction nich that the expression (f 'bind* g)
corresponds to (g ' O f).
The type of the function bind is: bind :: M a -> (a -> M b) - > M b;
the meaning of the expression (f 'bind* g) is: first apply the funclion f to
produce a cornpudon of the type M a, evaluate this cornputarion, apply the
funaion g to the r d t , and retum a computatîon of the type M b. The
hinaion bind is simply a composition in a aeisli category of computations.
The triple musc obey the thrce laws given below, which are equivalent tu the
laws given in the previous chapter.
unl a 'bind' \b + = n [a I b] - leff unit rn 'bind* ib + unit b = m - right une m 'bind' (ia + n 'bind' \b + m) = (m 'bind* h + n) 'bind* \b -+ m
- associativity
where n [a 1 b] denotes n with a substitut& for b
F i 9 ~ l a v i p l e h m
The Klcisli triple for lisrs, equivaient to the rnonad defined eulier, is given
on Figure 8 @üA92].
type M a = [al
mit :: a - > M a unit = \x -> [XI
bind :: M a -> (a -> M b) -> M b x 'bind' y = case x of
II -> 11 (a:x) -> ( y a) ++ ( X 'bind' y )
Figure 10 gleisli triple for Iisu
The functiondity of the function bind in the example above is
araightforward: if the £irst computation retums a list of possible r d t s then
the second computation musr be applied CO each elemenr of this lis; the
resuits of each application should be cornbineci, so that the fmal r d t is a
single lïst. The operator '++" in the example above denores list
conatenation. This operator could be replacecl with any other associative
operator that combines two lins (e.g., merge). The functions map and join
from the previous paragraph can be defined in r e m of unit and bind as
fouows @UA92]:
map f x = x 'bind' \a -> unit (f a)
join x 3 x 'bind' \a -> a.
Moggi [MO891 proposed Kleisli triples as a representation of
computations with 'effecu". His daim wu that Kleisli triples were more
convenient for expressing computations than monads. The svne seems to be
tme in funaional prognmming. The functionality of the bind opentor as a
composition of two prognms is intuitive and easy to understand. By
contrast, the functions map and join are d e r difficult to j - 5 from
computational point of view. Ln the remainder of this chesis we will use the
formulation of monads as Kleisli triples.
C h a p t e r 4
MONADIC CONSTRUCTION OF PURELY FUNCTXONAL RECOGNIZERS
1 Introduction
Monads are a powerfd tool in functiond progamming. If a program
is wrinen using a monad to pass around a variable ( N e the aate or
exception) then it is easy to change what is p d vound simply by
changlig the monad. Only the parts of the pro- that deal directly with
the quantity concenid need to be altered, parts which merely pass it on will
stay the same.
This chapter describes how monads an be used to construct functional
recognVrn. We srart by defining the type of the monadic recognizers. Next
we give the definitions of basic monads and disaiss how these mords can be
d to build procnsors thu incorponte a single 'effecr". The remainder of
this chapter addresses the problern of combinhg mon&. Different "effects"
an be combined by using parametrizÊd monads &I95] to yield the
recopizrs that involve a combin?tion of different features.
2 The type of the rnonadic recognïzers
In the introduction of this thesis we have def'rned the recognizers as
functions thiit applied to some input connune as much of it as possible and
retum the unconnimed part of the input for furdier processing. If the input
is represented as a string of tokens to be processed, the type of the
recognken can be written as
type Recognizer = String - > String.
Suppose that we want to have recognizers that retum exîaly one
result or fail otherwise. This can be açhieved by modifying the type above as
follows
type RecognizerEx = String - > Ex String
where the type c o r n c t o r Ex is defined as
data Ex a = Ok a 1 Fail.
(That is the r d t of the form Ok v reprwnts the suurssful recognition of
the input with the single value Y retumed; Fail represenu failure.)
Similady, the recognizcrs that retum a list of possible r d t s (where an
empty list denotes Mure) can be represented using the type cornruaor
type RecognizerList = String -> [String] .
The moaadic approach allows one to abstract over the 'effecr" the
recognizers incorpo-. Suppose that M t a 'monadic" type conmaor
that represents some feature. By defining the type of the processors in te-
of the type conswctor M, we cui make the type of the recognizers
independent of the effect they incorponte. Owing to the fact that the
representation the input strings tokens application dependent, we can
make ir into a parameter to the type Recognizer.
type Recognizer a = a - > M a
Figure 11 The type of rnonadic rceognkn
The type definition on Figure 11 should be interprend as follows: a
recognizer is a function that applied to a value of some type a r m
( i i e a d of retuming a result of the type a) a computation that produces a
result of the type a. Such a computation may encompass ideas nich as state,
arceptions, or nondeterminism. Later in this chapter we will discuss the
advamages of d e k g the type of recognizcn this way.
3 Basic mcinads
3.1 The identity monad
The identity monad represents computations as the values they deliver
(i.e. ueffect-free" computations). It is the starting point to which other
capabilities can be ad& The monad is represented by a triple: the type
constmctor M thac can be thought of as an identity h c t i o n on types, the
function unit (whidi is an identity function), and the funaion bind that
applies the function k to the value produced by the computation X.
type Id a = a
unit :: a -> Id a unit t \a -> a
bind :: Id a -> (a -> Id b) -> Id b x 'bind' k = k x
Figure 12 The idcntiry m o d
Using the operaton unit and bind of the identity monad we an define
the "identityn Ceffect-free") recognizers - the recognizen that either succeed
r d g a single result or, if som&g goes wrong, they simply produce
an internai error. The "monadic" definitions of the recognizer empty and of
the sequenhg operator then are given below.
type Recognizer a = a -> Id a
e m p t y R : : Recognizer a emptyR = unit
thenR :: Recognizer a -> Recognizer a -> Recognizer a (p -then' q) inp = p inp 'bind' \xl ->
q xi 'bind' \x2 -> unit x2
The definiron of the recognker (p 'then' q) an be interpred as
follows. First the recognizer p is applied to the initial input and the value
retwneci by p is bound to the variable xl . Next the rrcognizer is applied
to the input x i and its value is bound to the variable ~2 Finally the value ~2
is convened into the comsponding computation and thk cornpudon is
retumed as a result.
The identity monad does nor provide us with the notion of failure.
This notion is required to give meaningful definitions of the recognUcrs fail
and têm. In addition, in order to define the operator orelse we need to
speclfy the notion of 'choice". The monad definecl in the next subxcùon
provides us with both of these nouons.
3.2 Exceptions
Exceptions in purely functiod prognmming laquages were midieci
by Spivey who, independently of Moggi, noticed that monads are a useful
tool for representing exceptions in funaional prognms [SPSO]. We can think
of a value of the type Ex a (dehed on Figure 13) as a cornpuration that
either nicceeds with a single value of the type a, or it fails produchg no
value. Therefore exceptions correspond to dererminisllc choice.
In addition to openton unit and bind, it rnakes sense to d e h e two
additional operations on the values of the type EX a [LI95]. The opention
PIUS defines a composition (in the sense of genenlized addition) of two
pro- of the type EX a. PIUS for exceptions cui be interpreted as a
(detexminisic) dioice operator that retums the fint computation if it
suaeeds, and the second othemise. The opention zero is an identity for
PIUS and it represents a computation that always füL. The operations PIUS
and zero are not part of the exception monad. They simply provide a
different way of combining pro- of the type EX a.
We shail see Iater in this chapter &at the aapt ion monad is not the
ody monad for which it makes sense to define zero and plus. Another
example is a monad for Lists where PIUS coufd be defined as concatenation
(or merge) of raro lists with an identity (zero) an empty lis. The operators
bind (with identity unit) and PIUS (with identiicy zero) provide the meam to
structure prognms in modular wzy.
data E x a = Ok a 1 F a i l
unit :: a -> Ex a unit = \a -> Ok a
bind :: Ex a -> (a -> Ex b) -> Ex b F a i l 'bind' k = Fail (Return a) 'bind' k = k a
zero : : Ex a zero = Fail
plus :: Ex a -> Ex a -> Ex a Fail 'plus' x = x X 'plus' - = x
Figure 13 The exception m o d wi& zur, ;ind plus
We can now &fine the m o d c recognizers fail and terni as well as
the operator orelse. The definîtions of empty and then given in the
previous subsecrion do not have to be modified. We simply modify the type
of the recognizers and replace unl and bind of the identity monad with the
corresponding operators of the exception monad.
type Recognizer a = a -> E x a
orelseR :: Recognizer a - > Recognizer a - > Recognizer a (p 'orelseR' q) inp = p inp 'plus' q h p
f ailR : : Recognizer a failR = zero
te- :: Char -> Recognizer String termR c inp = case inp of
[ 1 - > fail (X:XS) - > if x == c
then unit xs else fail
Using the above definitions we can build (determiainic) recognizen,
for example:
s :: Recognizer String s = (te- \a' 'th-' s 'th-' s) 'orelseR' emptyR
Although the underlying grarnmv is ambiguous, the recognizer s defineci
above returns only one result. The empty string rmrned as the r d t means
that the whole input string has been Nccessfully rea@ as an S.
We WU use the lin monad to tnnsform Our deterministic recognUcrs
into the recogn;Zers that retum a set of possible r d t s . The &finition of the
monad for lisu given below is almost identid to that given in Figure 10.
The only difference is th* the operator ++ (ht concxtenation) has been
replaad with the operator merge-re~ which merges two lists sorteci in
ascending order with duplicates removed. The same operator merge-res is
also a Nirable plus for the monad.
type List a = [al
u n i t : : a - > la1 unit = \x - > Ixl
bind :: [a] -> (a -> Cbl) -> [bl x 'bind' y = case x of
C 1 - > [1 (a:x) -> (y a) 'rnerge-res' ( x 'bind' y )
zero : : [al zero = 11
plus : : Cal -> [al -> Cal plus = merge-res
Figure 14 The lin m o d with zero and plus
The deterrninistic recognizers can now be transformeci inro non-
determinisric ones by modifying their type
type Recognizer a = a -> [a],
and by replacing unit, bind, zero, and plus of the exception monad with the
corresponding openton of the list m o n d If these changes are made the
recognizer s fiom the prcvious subsection behaves as follows.
4 Combining monads
4.1 Construction of a combined monad
So f u in this chapter we have dixusxd recognizen that rake as an
argument a string of chvacvn ro be processed. Suppose that we decide to
modify them so that thcy accept ~o parameters: the whole input string and
a single position that &es which diuaaer should be processeci fkt. As
we have shown in chapter 1, the type of such processon an be arpressed in
te- of the type Recognizer as follows
type RecognizerNew a = String -> Recognizer a.
The additional effm which r e c o e r s of the type RewgnizerNew
incorporate can be represented using the state-reader monad [WA90, WA92].
Owing to the fact that we would like to add this effeu on top of the effecu
rhat are Aeady in place (exeptions or lins), we will use the parametrized
aate reader monad m95] which is shown on Figure 15.
Assume diar we already have a monad (M. unit, bind). A monad
pararnevized over M can be constructed by definhg a new bind operation,
in terms of the old bind, and by defining a function lift,
lift :: M a -> MNew a
that lifts operations of the type M a in into opentions of the new type
MNew a. If the undu1yh.g monad has zero and plus chen the
43
corresponding opentors zero and PIUS for the new mond can be dehed in
r e m of the old ones. The main advantage of this technique is that the
definition of the operators unit, bind, zero, and PIUS of a combined monad
are independent of the choice of the base monad.
4.2 The parametrized state reader monad
The state rader monad abstracts over computations that rad from the
state but never updw it. Owing to the fact thar the 'post-staten is dways
assumed to be identical to the 'pre-stare", rhere is no need to renirn it. The
definition of the parametrized state reader monad is given on Figure 15.
type StRM m s a = s -> m a
1 i f t S t R . M :: m a -> StRM m s a 1iftStR.M x = \t -> x
unitStRM : : a -> StRM m s a unitStRM x = l i ftStRM (unit x)
bindÇtRM :: StRM m s a -> (a -> StRM m s b) -> StRM m s b (a 'bindStRMg k) t = a t 'bind' \va ->
k va t
zeroStRM :: StRM m s a zeroStRM = \a -> zero
plusStRM :: StRM m s a - > StRM m s a -> StRM m s a (X 'plusStRM- y) s = X s 'plus' y s
F i 15 The parametrizcd sate r& monad
The type constnicror StRM defines the type of computations that are
applied to some nate (of the type S) and r e m a computation of the type
m a. The operations unit and bind are deGned in r e m of the unit and bind
of the underlying monad. Although the stare reader monad does not have its
own PIUS and zero, these opentions can be defined in rems of the
comsponding operations of the base monad.
We can now define the type of our recogniw in temu of the type of
the ppanmevlled state reader monad.
type Recognizer a = a -> StRM m String a
That is our new recognizers are functions that applied to a value of the type
a (current position) and a state (an input string) rmirn a computauon of the
type rn a. The definitions of fail, empty, then, and orelse do not have to be
modiGed (except for the fact that the opemon unit, bind, zero, and pius are
now of the different ~ o I ~ J . The ody function that m u t be rnodified is the
recognizer term.
te- :: Char -> Int -> StRM m String Int termint c x s
1 (xci) ( 1 (x>length s) I I (s!! (x-1) /= c) = zero 1 otherwise = unit (x+l)
If the base monad of the parametrized state reader monad is the monad
for exceptions, the type of the recognizers is
type Recognizer a = a -> StRM Ex String a.
45
In other words, our new recognizen when applied ro î single stm
position and an input string of characters either r e m a vdue of the form
Ok v (that represents a single srart position for the next processor), or they
retuni the vdue Fail (that represents fulure). One advantage of defining the
new recognizers this way is that the previously given definitions of fail,
then, orelse, and ernpty do not have to be modifieci. We simply replace the
operaton unit, bind, plus, and zero with the corresponding operators of the
parametrized state reader monad.
The recognizer s defined earlier, whidi is now of the rype
Int - > StRM Ex String Int
behaves as follows.
The result Ok 6 returned by the recognizer means that the whole input
k g ha been s u d y recognrzed as an s (the position 6 of the input
string is the end of the string).
If the base monad of the parametrized state r d r monad is the monad
for lists, the type of the remgnizers is
type Recogaizer a = a -> StRM [l String a.
The recognizer s when applied to a start position and the whole input
string mums now a set of possible stan positions for the next processor.
5 Conduding remarks
The tedinique dexribed in this chapter allows one to construct
programs out of componenu that represent various prognmming language
fanues. The technique can be implementeci in any purely functional
programming language. However, in a programming language that does not
support overloaded operators: 1) different names mut be used for each
operator unit, bind, plus, and zero, 2) for each combination of featwes the
correspondhg operators mus be dehed expliudy. It is up ro a programmer
to determine which definitions of the operaton to use in a given contez.
The system of construccor dvses in Gofer IJNA931 allows one to
d e h e classes of types with overloaded operators. The main advantage of
k g consuuctor classes to implement %ombined* mon& is that there is
no need to use different names for the operaton such as bind and unit if they
are used in Werent contexts (that is, if they are pans of definitions of
Werent monads). The rype checker automatidy determines which
definition of bind or unit to use. The construction of language promson
using consuuctor classes is described in the n a chapter.
C h a p t e r S
MONADIC CONSTRUCTION OF MEMOIZED LANGUAGE PROCESSORS USING TYPE
CONSTRUCTOR CLASSES IN GOFER
1 The system of type construstor classes in Gofer
The system of type constructor dasses in Gofer allows one to d e h e
dasses of types with overloaded operators wA93]. Overloading enables the
definition and use of functions in which the mmeuiing of a funaion symbol
may depend on the types of its arguments.
Classes can be related in a class hierarchy. For example one class may
be defined as a 'subdass" of another (one of its "superclasses"), or it may be
composed of other classes. For each dass a set of suitable operations
(methods) can be defined. A subclw inherits a i l of the merhods of iu
supercluses.
2 Monads and type constmctor classes
Each monad is a triple (M. unit, bind) where the types of the nvo
operaton unit and bind are dehed in terms of the type constmctor M.
These types always have the same structure (no matter what feature the
monad represents). Therefore we can define a monad as a ctss pvvnetrized
over the type construcfor M, with two methods: unit and bind.
As we have shown in the previous chapter for some monads ir makes
xnse to defme m o additional operators: PIUS and zero. The structure of the
type of these two operaton is 'effect" independent. We cm define a dass
MonadPlusZero as a subclass of the class Monad that has two additional
methods: plus and zero.
class Monad m where unit :: a -> rn a bind :: rn a -> (a -> m b) -> m b
class Monad m => MonadPlusZero m where plus :: m a - > m a -> m a zero : : m a
There are rhree parts in any dus declaration. In the example above the
finr line ( d e d the header) of the declaration introduces the name Monad
for the dus and indicates rhas the clas has a single parameter, represented by
the type variable m. The second part (the signature part) is a lin of funaion
(method) dedaratiom. For each instance of the dw Monad we can define
rwo methods: unit and bind. The third part (not prisent in the example
above) may contain default definitions of the methods. For example Figure
17 shows the dw Recognizer with defadt definitions of the functions
emptyR, thenR, orelseii, and failR.
All the basic monads defined in the previous chapter can be now
represented as instances of the class Monad and MonadPlusZero (if
applicable). Iosraaces of a type clus in Gofer are defineci using declarations
similar to thox used to define the correspondhg type dass. For example the
foliowing dedarations specify that Monad Ex and MonadPlusZero Ex are
instances of classes Monad m and MonadPlusZero m respectively
instance Monad Ex where unit = unit& bind = bindEx
instance MonadPlusZero Ex where plus = plusEx zero = zero-
(where operations with the Nff jx 'Exn are those dehed previously for the
exception monad).
Similarly, parametrized monads can be represented as instances that
inherit (denoted using the symbol '=>") the monad operations from the base
instance Monad m => Monad (StRM rn s) where unit = unitStRM bind = binàStRM
instance MonadPlusZero m => MonadPlusZero (StRM rn s) where
zero 3 zeroStRM plus t plusStRM
3 The dass Recognizer
We have shown in the previous diapter that most of the basic
recognizen and combinaton can be defined in terms of the operaton unit,
bind, plus, and zero. Therefore it is convenient to define the dass of
recognizers as a nibdass of the clw MonadPlusZero.
te- :: Char - > a - > m a emptyR :: a -> rn a orelseR :: (a -> m a) - > (a - > m a) -> (a - > m a) thenR :: [a -> m a) - > (a - > m a) - > (a - > m a) failR :: m a
class MonadPlusZero m => Recognizer m a where
emptyR = unit (p 'the' q) inp = p inp 'bind' \xi ->
q x l 'bind' \x2 ->
unit x2 (p 'orelseR' q) inp = (p inp) 'plus' (q inp) failR = zero
Figure 17 The class Rccognizer
One advantage of defining the remgnizen this way is that once the
underlying monad m is defineci, the conesponding opentors empty, then,
orelse, and faii are defineci automaucally. The definition of term depends on
the represenration of the input (which is either a string, or a pair: an integer
number representing a start position and a string). Therefore we can define
m o instances of the clus Recognivr (where temilnt and temiChar are the
m o definitions of t e n that were given in the previous diaprer).
instance MonadPlusZero (StRM m string) => Recognizer (StRM m String) Int where
te- = termint
instance MonadPlusZero m => Recognizer m Str* where te- = termChar
We an now d e h e the recognizer S.
By simply changing the type definition of s we cm change its
behavior. For example if the recogrWer s is of the type
s : : Recognizer (StRM El String) Int => In t - > StRM [ J String Int
then it behaves as follows
s 1 "aaaaaW Il, 2 , 3 , 4 , 5 , 61.
If we change its type to
s : : Recognizer (StRM Ex String) I n t => fnt - > StRM E x String I n t
then the same recognizer applied co the same input returns value
Ok 6 ,
More examples can be found in Appendix B.
4 Memoized recognizers
Memoiution involves interaction with state. Statefd computations
can be represented in purely fimaional progams by using the suxe monad
[WA90, WA921. We stm by presenting the definition of this monad.
4.1 The state monad
The monad dehed below cm be used for adding nate operatioos to a
purely baional prognm. The type of progrvns th= intema with the sute
is defined as the type of a function that t?kg as its panmeter an initial sttte
and returns its value paired with the finai srate. The funaion unit &es a
value and a state and renims the same value paired wirh the initial nate. In
other words, the function unit a is "an identity" aate transformer. The
funaion bind combines rwo "stateful" computations. First, the computation
x is evaluated in the initial state t, next, the function k is applied to the value
returned by x and to the new aate t ~ .
type St s a = s -> (a, s)
d t S t :: a -> St s a unitst a = \t -> (a, t)
bindçt :: St s a -> (a -> St s b) - > St s b (a 'bindçt' k) t = k va ta
where (va, ta) = a t
instance Monad (St fi) where unit = unitSt bind = bindçt
Figure 18 The ss;;te monad
- - -
Owlngto thcfa tturthe p q o e of memoiuaorr isr&npreve-the
effiaency of processon that retwn a set of possible results, the memoized
recognizcrs involve a combination of rwo features: sure and non-
determinism. In order to combine these features we inuoduce the
parametrized monad for lists.
unitLsM : : Monad m => a -> L i s t M rn a unitLsM x = l i f t L s M (unit x)
bindLsM : : Monad m => ListM m a - > (a -> ListM m b) -> L i s t M rn b x 'bindLsM' k = x 'bbd' \xl - >
fol& p l u s L s M zeroLsM (map k xi)
liftLsM :: Monad m => m a -> ListM m a liftLsM x = x 'bind' \XI ->
u n i t Exil
instance Monad m => Monad (ListM ml where unit = unitLsM bind = bindLsM
instance Monad m => MonadPluszero ( L i s t M ml where zero = zeroLsM plus = plusLsM
zeroLsM : : Monad m => L i s t M m a zeroLsM = unit []
p l u s ~ s ~ :: Monad m => ListM m a -> L i s - m a - > L i s t M rn a (X .plusLsMI y) = x 'bind' \xl ->
y 'bind' \x2 -> unit (xl 'me-res' x2)
If the base monad for the parametrized List monad is the monad for
srare then the combination of the two monads yields the computations that
are of the type
ListM (St s) a = s -> ([al, S I .
In other words each cornputaion is applied to an initiai sate and returns a
List of values paired with the modifieci nate. (Note tbat combining the sune
monads in the reverse order yields cornpurations of different type).
Our memoization algorithm applies to the recognizen chat take as
parameters a single position and the whole input aring, therefore on top of
the two features: state and nondetenninism, they involve one more 'effm"
that can be represented using the parametrized state reader monad. We cui
define the type of memoized recognizen in r e m of the types of the three
monads as follows
type Recognizer a = Int - > StRM ( L i s t M (St s) ) String Int .
4.3 The memo-table
The introduction of the sate monad does not immediately improve
the performance of memoized recogniÿers. Operations that uress the state
are required to store and retrieve the previousiy computed results. We have
decided to represent rhe nate as a list of p i n . The first component of each
pair is an integer number that acts as an index to the memo-tabie, the second
component is P iist of pairs (recog nizer-name, recog nizet-value). OwLig
to the faa that different processon may retum values of different types, it is
convenient to parametrize the type of the memetable over the type of
values it stores.
type S t a t e v = Wnt , [(String, v)])]
The purpose of the function IookupSt is to return the value that
corresponds to a %art position and a recognizcr name (given as puameters).
The function updateSt given a start position, recognizer name, and the
vdue reninied by the recognizer, updates the corresponding entry of the
memo-table, The function newSt creates a new memo-table. The definitions
of these functions can be found in Appendix B.
The top Ievel function memoize is applied to each recognizer to store
its result in the memo table. The function is defineci in terms of the unit and
bind operaton of the state monad.
memoizeRec :: Recognizer (StRM (ListM (St (State [Int] 1 ) ) String) Int
=> String -> ( Int -> StRM (ListM (St (State [Int]))) String Int) -> (Int -> StRM (ListM (St (State [Int]))) String Int )
The definition of the function memoize conesponds dosely to the
rnemoiwion algorithm. In order to memok the recognizer f rhat is applied
to a start position i and the whole input string of tokens s we first xudi the
memo-table for the r d t that corresponds to the recognizer name and the
srarr position i . If the result w u computed before this r d t is returned.
Orherwise, the new renilt is computed and the memetable is upbred.
The rnernoized recognLer rns defineci as
ms : : Recognizer (StRM (ListM (St {State [Zntl 1 ) String) Int
%> fn t - I StRM (ListM (St (State [Int] 1 ) 1 String Int m s t mernoizeR ?nsm ((te- 'a1 'th-' ( m s 'thenR' ms) )
'orelseR' emptyR)
when applied to a string "aaaaa" at position 1 renirns the followiq result-
5 Parsers
5.1 The type of the monadic parsers
The monadic approd ailows one to evily extend the techniques
presented in the previous d o n s to more complex Ianguage processors
R d once agah the type of parsers:
type Parser a = s -> (a, s) -
The type of pvsen corresponds directly to the type of the m e monad
where the state is a SeQuence of tokens to be processed Such a monad is
sometimes callecl the input monad [WA90, WA92J. Similarly to the type of
monadic recognizea we can express the type of parsen in terrns of the
'monadic" type construaor m.
type Parser a = m a
5.2 The ciass Parser
Basic panen and combinaton can be defined in terms of the operators
unit, bind, plus, and zero. We can d e h e the dass of parsen as a subclars of
the c h MonadPlusZero.
class MonadPlusZero rn => P a r s e r m a w h e r e term :: C h a r -> m a empty :: a -> m a orelse :: m a -> m a -> m a fa i l : : m a
empty = unit orelse = plus
1
I fa i l = z e r o
Figure 20 The dvs P a r
(Note that we have not defined the operator then for panen. The
reason for that is that the monadic operator bind which integares the
sequencing of h a i o n s with the procesling of their values is much more
convenient to W.)
Having dehed the dus of panen we can build various parsen in a
modular way by combining appropriate monads. In addition ro the monads
defined so far, we wA use the parametrized input monad which is described
in the next subsection.
5.3 The parametrized input monad
The input monad is identical to the monad for state where die state is a
sequence of tokens. The type of programs is the type of functions from a
string (the input to the program) to a computation that involves a pair: value
and a string (the input to the rest of the program).
type ~ n p ~ m s a = s -> m (a, s)
unitLnpM :: Monad rn => a -> InpM m s a unitInpM a = liftInpM (unit a)
bindInpM : : Monad m => InpM m s a -> (a -> InpM m s b) -> InpM m s b
(a 'bindXnpM' k) inp = a inp 'bind' \(M, outa) -> k va outa
liftInpM :: Monad m => m a -> SnpM m s a liftInpM x inp = x 'bind' \xi ->
unit (xl, inp)
zeroInpM :: MonadPlusZero m =a InpM m s a zeroInpM = \inp -> zero
plusInpM :: MonadPlusZero rn => I n p M m s a -> InpM m s a -> InpM m s a
(X 'plusïnpM' y) = \inp -> x inp 'plus' y inp
instance Monad m => Monad (InpM m s) where unit = unitInpM bind = bindXnpM
instance MonadPluszero m => MonadPlusZero ( ï n p M m s) where
zero = zeroInpM plus = plusInpM
If the underlying monad of the input monad is the monad for lists, the
type of the corresponding computations is
which is exacdy the sarne as the type of nondeterminisric pusers defined
earlier. If the base rnonad is the exception monad, the r d t i n g parsers are
determinisic, they either r e m a value of the form Ok (a, s), or they
r m the value Fail that denotes failure.
To obrain parsen that take as pvvneters a single position and the
whole input aring of tokens we simply apply the parametrized state reader
monad on top of the parametrued input monad. The type of the resulting
StRM (InpM m Int) String a = String - > Int - > rn (a, Int) .
That is each parser when applied to an the input aring of tokens and a single
position rewns a computation that involves a pair: the value of the parser
and the nart position for the next puter. The corresponding paner terni can
be defined as follows,
termEInt :: MonadPlusZero m => Qlar -> StRM ( I n p M m Int) Strirlq Int
termEfnt c s x 1 (xci) 1 1 (x>length s) I I (s!! (x-1) /= C ) = zero 1 otherwise = unit (eval c, %+il
where the function eval is application dependent, and retums the value that
corresponds to a character given as its panmeter. The corresponding
instance of the dass Parser is defined below-
instance MonadPlusZero (StRM (InpM m Int) String) => Parser (stm (InpM rn In t ) String) fnt where
tenn = termEInt
New parsers or eduators can now be constmcted by combining
simples procesfun using the operaton bind and orelse. For example an
evaluator with the structure that corresponds to the recognizer s dehed
earlier can be constructed as foliows.
e s (term 'a1 'bind' \xl - > e 'bind' \x2 -:,
e 'bind' \x3 -> unit (xl + x2 + x3)) 'orelse' ewty 0
Suppose that the funaion eval is defined as eval 'a' = 1. If the type of
the evaluator e is defined as
e :: Parser (StRM (InpM [] Int) S t r i n g ) Int => StRM (InpM [] Int) String Int
then the evaiuator applied to the string 'aaaaa" at position 1 retums a set of
values.
If the rype of the same evaluator is de£i.ned as
e : : Parser (StRM (InpM Ex Int) String) Int => StRM (InpM E x Int) String Int
then the same evaluator renims a single d u e .
5.4 Memoized Ianguage processors
The above processon can be memoized by repking the monad for
exception or nondeterminiSm with the combined monad for l i s and state
(exaaly the same combination as was previously introduced to memoize
recognizers). The values stored in the memo-table are now lins of pairs of
integer numbers. The fim component of each pair is the value retumed by
the evaluator, rhe second component is the suut position for the next
processor. The type of the resulting e v h t o n is
(StRM (InpM (ListM (St ( S t a t e [(fnt, I n t ) 1 ) 1 ) Int) String) Int.
Although the above type definition may look a little cornplex, this is
in fact the only addition that is reguired to obtain memoized processon. A
memoized evaluator me can now be dehed as
me = memoize "meN ((term 'a' 'bind' \xi -> me 'bind' \xZ -> me 'bind' \x3 -> unit (xl + x2 + - 1 ) 'orelse' empty 0 )
When applied to the string 'aaaaa" at position 1 it retums a pak set of
values and a memetable.
6 Complexity of memoized language processors
A dctailed complexity analysis of memoucd funaional recognizers can
be found in Appendix A. The d y s k there holds for monvlic recognizers
as described in rhis chapter. Each memoized recog&r is built up out of
tkee monads: the lisc monad, the state monad, and the state reader monad.
The use of the Iist monad guarantees that each processot is applied to al1
elements of the input lisr, and the corresponding resulu are combined ming
the function merge-res (that merges rwo lisu sorted in ascending order,
removing duplicata). The same fundion merge-res is used to combine
results of alternate processors (orelse). The use of the state monad to
implement mernohion gwrantees that die memoiable is passed to eadi
recognLer as a parameter and the modified m e m d l e is returned as irs
result. Fhdy, the =te reader monad enables accessing curent position of
the input h g .
The r n e m d l e is structufed as a lia of (n + 1) pairs where the first
element of evch pair is an integer number i represenring the sta* position;
the second element is a list of results correspondhg to each application of a
recognizcr at the position i. Each result is a iist of at most (n + 1) integer
numben. The number of recognizen is independent on the size of the input,
therefore the size of the memo-table is O(n3. Owing to the f a a rhar purely
functional propmxning languages do not (in genenl) ailow variables to be
destructively updared, each update of the memo-table results in the creation
of the new modifieci memo-table. The number of possible updates is linear in
the size of the input, therefore the space required by the algorithm is 0(n3).
This performance can be improved in a purely functional language that
suppoxts updatable abjects. In his fim paper about monads wA90] Wadler
noticed that it is possible, within the monadic fnmework, to add updatable
in-place arrays to ~u re ly functional programming languages without
compromising strong reasoning prinaples valid for these languages. He
proposed to implemenr such v n y s as an abstract data type with a set of
well-defined opentions The encapsdation of an array guarantees t h the
programmer annot dupli- it. Combined with the use of monadic
sequencing, it guarantees single threading of the array through the prognm.
Based on the Wadler's work on monab, Launchbury and Peyton Jones
presented a way of securely encapdating stateful computations that
manipulate multiple mutable objects [LA93, LA94, LA951. Updatable
variables are ninently implemented in the Glasgow Haskeil Compiler.
Time and space complexity of more cornplex processon such as
parsers or syntax-directed evaluators is application dependent. If the number
of different results to be retumed is exponential in the length of input, the
comsponding language procesx>r will have arponential complexity. One
advantage of the m o d c approach to building memoized language
processon is that the programmer can easily switch on and off the feamres
the processors incorporate (by sirnply redefining the type
In other words, dependhg on the application a memoized
version of the processon an be use&
of the processor).
or non-memoized
C h a p t e r 6
CONCLUSION
In rhis thesis we have describeci how memoization can be implemented
to improve effiaency of purely functional language processors. An
important contribution of this thesis is the monadic specification of nidi
processors. Mon& provide a general technique of adding various feahues to
purely funaional prognms. We have shown how mernokation can be
treated as one such feature.
The monadic approach dows one to abstacc over features a progmm
incorporates. A monadic program is built from components of an abstract
type m a. To add a new "effectn ro the program one simply changes the
meauhg of 'mn and then ad& or modifies only chose cornponents that deal
directIy with the &effeam king added. Assuming the correctness of the initial
prognm, only part. that have been added or modifieci m u s be tested.
The program given in Appendix B illustrates how expressive, modular,
and easy to modify applications one an develop using the system of
consmaor dasses and monads. The prognm consisu of a library of mon&
and a libary of simple monadic luiguage processors and combinaton. New
language procesors cui be built from simpler components by combining
them using higher-order combinaton. The behavior of such processon un
be changed by simply modifying their types.
We have used the technique proposed by Liang, Hudak, and Jones
[LI951 to consnia memoized luiguage processors, in a fully modular way,
from cornponenu that represent various "effecrs". We have ben pleasantly
surprixd by the flexibility of this merhod, in particdar, by the ease with
which it was possible to extend the technique for memoizing purely
hinctional recognizers io more complex Ianguage processors. The extended
technique can be used CO memoize parsers as weU as prcgnms that are
constructed as exenidle attribue gnmmvs Wt, FR951.
NOTE TO USERS
Page(s) not included in the original manuscript are unavailable from the author or university. The manuscript
was microfilmed as received.
M. P. Jones, and L. Duponcheel. Composing monads. Technical report YALEU/DCS/RR-1004, Yale University, Dept. of Computer Science, New Haven, Connecticut, December 1993.
M. P. Jones. The implementation of the Gofer hnctional progr;imming system. Technid reporc YALEU/DCS/RR-1030, Depanment of Computer Science, Yale Universiry, New Haven, Connecticut, USA, May 1994.
D. J. King, and P. Wadler. Combining monads. In Glasgow Workshop an FwactionuI Programming, Springer-Verlag Workshops in Computing Series, pages 134-143, Ayr, Scotland, J d y 1992.
J. Launchbury. Lay irnperative prognmming. In ACM SIGPtAN Workshop on Srare in Programming Languuges, Copenhagen, Denmark, June 93.
J. Launchbury, and S. Devon Jones. Liiy funaional state thseads. In Prograrnmzng Languages Design a d Impfemenration, ACM Press, Orlando, Florida, 1994.
J. Launchbury, and S. Peyton Jones. State in Haskeil. LASC 8(4), pages 293-341, December 1995.
S. Liang, and P. Hudak, and M. Jones. Monad transfomers and modular interpreters. In Conference Record of POPL '95: 2 2 d AOM SIGPLAN-SIGACT Symposium on Principks of Programming Lanpges, S a n Francisco, CA, January 1995.
S. Mac Lane. Categories for the Working Mathematician. Springer-Verhg, 1971.
B -MQ&.- Compurationd ~ a m b & s & u ~ ~ ~ japd monads. - - - - In - Pmceedings of the Fou& Annwl Symposium on Logic in Compter Science, pages 14-23. IEEE June 1989.
E. Moggi. An abaracc view of programming languages. Technical report ECSLFCSSO-113, Dept. of Computer Science, University of Edinburgh, Scotland, A p d 1990.
B. C. Pierce. Basic Category Theory for Computer Scientists. MIï' Press. 1991.
J. Petenon, and K. Hammond, editors. Haskell 1.3, a non-strict, purely functiona! laquage. Technical report YALEU/DCS/RR- 1106, Depvcmenr of Cornputer Science, Yale University, New Haven, Connecticut, USA, May 1996.
M. Spivey. A hnctional theory of exceptions. Science of Compter Progromming, 14(1), pages 25-42, June 1990.
G. L. Steele. Building inrerpreten by composing monads. In Principles of Programming Languages, pages 472492. ACM Press, January 1994.
Turner, D., A., An overview of Miranda. In D. A. Turner, editor, Research Topia in Firnctionaf Aograrnming. ACM, June 1990.
P. Wadler. How to replace failure by a lin of successes. Conference a Functzctzond Programming and Compter Architecture, LNCS 201, Springer-Verlag, September 1985.
P. Wadler. Comprehending monads. In Conference on Lisp and ~itnctiona~programming, pages 61-78, ]une 1990.
P. Wadler. The essence of funaional prognmming. In PraKples o f Pronramminn Lanmpes, paaes 1-14, January 1992.
A p p e n d i x A
MEMOIZING PURELY-FUNCTIONAL TOP-DOWN BACKTRACKTNG LANGUAGE PROCESSORS2
Richard A. Frost and Barbara Szydlowski School of Compter Science, University of Windsor, Windsor, Ontario, Canada N9B 3P4
November 1996
Langwge processors m ~ y be implemented d i d y as fifunmm In a prugrammzng Lznguage rhat supports higher-urderfitt~~t~~ons, large processon u n be kilt by combining d l e r components Kcing highm-order functim cosresponding to altematàm and sequencrquencrng in the BNF notation of the grarnmar of the kangwge to be p r o c d I f tbe bigher-orderfirfzctt-ons are &defi& to irnpImmt a t o p d m Mtracking pamrSZng mtegy, the processm are modukar anrl, owing to the fa thut tbey r m M e BNF notatrtatrm, are ~ s y to understand and modif A m j o t disadvuntage of this aprOrZCh is k t the
processm whik pzwviing their modukzrity. We s h o w that memoized f i n c t i d retognizas cOICStTUCted fm arbitrary non-left-recum've grnmmars have qd) cmplem'ty wbm n is the length of the input to b e p ~ o c d ï%e paper &O shows how the inirial processon couid have been m o i z e d mUSLdefig a m d i c
1 Constmaing modular nondetermînistic laquage
processors in functional programming languages
One approach to implemenring laquage processors in a modem
funaional programming language is to define a number of higher-order
functions which when used as infix operaton (denoted in this papa by the
prefix $) enable processon to be built with structures that have a direct
correspondence to the gramxnars defining the languages to be processed. For
example, the function S, defined in the functional program in Fig. îî, is a
recogn;Zer for the language defined by the grammar s ::= 'a' s s 1 empty if
the functions term, orelse, then, and empty are defined as shown in the
next few pages of this paper.
s = (a $th- s $then s) $orelse empty a = term 'a'
This approach, which is describeci in detail in Hutton [IO], w u
origindy proposed by Burge [2] and further developed by Wadler [18] and
Fairbum [4]. It is now frequendy used by the functional-prognmming
community for language prototyping and n d - l a n g u a g e processing. In the
following, we describe the approach with respect to language recognizers
ahhough the tecbniqw can be r d y extended to parsers, syntaxdkected
wduaton and executable sp&cations of attribute grammus [l, 6,7,123.
Accordhg to the approach, recogn;Zers are hinctions mapping lisa of
inputs to lim of outpuu. Each entry in the input l i s is a sequence of tokens
to be analyzd. Eadi entry in the outpur lisr is a sequence of tokens yet to be
processed. Using the 12otion of 'fadure as a lin of successes" [18] an empty
output lin sigdies chat a recognizer has failed to recognize the input.
Multiple entries in the output occur when the input is ambiguous. In the
examples in diis paper it is assumeci that all tokens are single characters. The
notation of the prognmming language Miranda' [iT] is used throughout,
rather rhan a functional pseudo-code, in order that readers can experiment
with the definitions directly.
The types token and recognizer may be defined as follows where - - meam "is a synonym forw, x -> y denotes the type of functions from objects
of type x to objects of type y, and squve bnçkets denote a h.
token == char recognizer == [ [token] 1 - > [ [token] 1
That is, a recognizer takes a list of h s of tokens as input and renirns a lin of
Lins of tokens as r d t . Note that this differs from the type found in many
other papcn on functional recopkts. The reason for this differuice is that
it simplifies the memoiution process as will be e x p l a i d later.
The sirnplest type of recognizer is one h t recognizcs a single token at
the beginning of a sequence of tokens. Such recognizen may be constmaed
using the higherorder function terni defineci below. The notation x :: y
declares x to be of type y. The function concat takes a list of lisrs as input
and concatenates the subIists ro form a single list. rnap is a higher-order
hct ion which takes a funaion and a lisr as input and reninis a list that is
obtained by applying the function to each element in the input kt. Function
application is denoted by juxtaposition, ir. f x meam f applied to X.
Function application has higher precedence than any operator, and round
brackets are used for grouping. The empry lis is denoted by [l and the
notation x : y denotes the list obtained by adding the element x to the front
of the Lisr y. The applicable equation is chosen through pattern matching on
the left-band side in order from top to bottom, together with the use of
guuds following the keyword if.
term : : token -> recognizer term c inputs = (concat . map test-for-cl inputs
where test-for-c II = Il test-for-c (t:ts) = [ts] , if t = c test-for-c (t:ts) = E l , if t -= c
The following illustrates use of term in the construction of two
r e c o ~ r s c and d , and the subsequent application of these recognizen to
three inputs. The notation x => y is to be read as ÿ is the result of
evaluating the expression x". The empty lin in the second example s i g d e s
that c failed to recognUe a token 'CI at the beginning of the input U ~ n . The
notation %, . . x,," is shonhand for F,', . . ,'x,J.
d = term Id1
Alternate recognizers may be built using the higherorder funaion
orelse defiaed below. The operaror ++ appends two Lm.
orelse :: recognizer -> recognizer -> recognizer (p Sorelse q) inputs = p inputs ++ q inputs
According to th defition, when a recognizer p $orelse q is applied to a
list of inputs inputs , the value retumed is computed by appending the
results retumed by the separate application of p to inputs and q to inputs.
The following illustrates use of otelse in the construction of a recognizer
o r - d and the subsequent applicauon of this recognizer to three inputs.
cor-d = c $orelse d
cor-d Pabcnl = > E l cor-d Cncxyz"l => C"xyznJ c-or-d Endxyznl => [ n ~ n ]
SequenQng of recoga;Zers is obtained through use of the higherorder
then :: recognizer (p $then q) inputs
-> recognizer -> recognizer = (1 , if r = [] = q r , otherwise where r = p ipputs
According to this &finition, when a recognizer p $then q is apphed to a list
of inputs inputs, the r d t retumed is an empty lis if p fails when applied to
75
inputs, otherwise the r d t is obtained by applying q to the result retumed
by p. (Note that in general, then does not have the same effm as reverse
composition. In paxtidar, replacing p $then q by q . p will result in non-
terminating cornpuratioas for certain kinb of recursivelydefmed
recognizers.) The following illustrates use of then in the construction of a
recognUrr C-the-, and the subsequent application of C-the- to two
inputs:
The "emptf recognizer, which dways succeeds and which renirns the
complete List of inpuu as output to be procesd, is implemented as the
identity funcrion:
empty inputs = inputs
The funcrions term, orelse, then, and empty as defined above, may be
used to construct recognizers whose definitions have a direct a r u d
relationship with the context-free grammm of the languages to be
recognized. Fig. 23 illustrates this relationship.
BNF granunar of the language The program
[terminais = { @ a 8 ) a = term 'a' I Figure 23 The rcttionship between the grammar ancl
the pro- imp1rmmti.q the rtcogn;rtr
The example application given below illusvates use of the recognizer s
and shows diar the prehes of the input 'aaa" can be successfully r e c o g d
in nine different ways. Ernpty strings in the output, denoted by "",
correspond ro cves where the whole input 'aaa" has been r e c o g d as an
S. The output shows that there are five ways in which this can happen. The
nvo srrings in the output consisting of 'a" correspond to cases where the
preh 'aan has been recogmd leaving 'an for subsequent processing. The
output shows that there are m o ways in which this can happen. The string
in the outpur consisting of two letters 'aa" corresponds to the case where
the prefix 'an has been r e c o g d leaving 'aa" for subsequent processing.
This cari only happa in one way when 'a" is recognu«i as an S.
The major advantage of this approach is that the processors created are
modular executable speafications of the languages to be processed.
Components can be dehed, compiled and executed directly. For aample,
(a $then s $then S) is a recognizr rhir may be executed M y as for
exarnple:
The advanrages of building language processors using this technique
corne at a price. The processon employ a naive topdown fdy-badnracklig
searcfi arategy and comequendy exhibit exponential-time and space
behavior in the worst cw. In the following, we show how diis problem can
be overcome through a process of memoization. We begin by dixussing
tediniques h t have been proposed by other researchen conceming the use
of memoimion with topdown backtracking language processors We then
describe how memoization can be achiwed at the sourcecode lwel in
purely-functional prognmming languages and show how the technique can
be adapted for use to improve the effiaency of topdown backtracking
recognizers. We provide a formal description of the algorithm and a proof of
the complexity r d t . In addition, we show how the same result can be
obtained in a more struaured way by use of a monad. We condude with a
discussion of how the approach can be used with parsen and executable
attribute gnmmus.
2 Mernoking language processors
Memoization [9,14] involves a proceû by which funcrions are made to
automaucally recall previouslycomputed results. Conventional
implementvions involve maintenance of memetables which store
reference to iu memodle. If the input has been processeci before, the
previouslyamputed result is rrnuned. If the input has not been p r o 4
before, the r d t is computed using the original definition of the function,
with the exception that all recursive calls use the memoized version, the
memo-table is then updated and the result rmmed.
Many of the efficient algonthms for recognition and parsing make use
of some kind of table to store weIl-formed substrings of the input and
employ a form of mernokation. Earley's algorithm D] is an example. In
most of these dgorithms, the parsing and table update and lookup are
interwined. This r d t s in relatively complex proasson that are not
modular. Norvig [16] has shown how mernohion can be used to obtain a
modular procesor, with propertia similas to Earley's algorithm, by
memoizing a simple modular topdown ba&rackhg parser genentor.
Norvig's m e m o i d paner generator cannot accommodate left-recursive
productions but would appear to be as efficient and general as Earley's
algonthm in aU other respects. Accordhg to N o ~ g , the mernoid
recopnizers have cubic complexity cornpared to exponential behavior of the
original unmemoized versions.
In Nomig's technique, mernohion is irnplemented at the s o u r m d e
level in Common Lisp through definition and use of a funnion called
memoize. When memoize is applied to a funaion f, it modifies the global
definition of f nich thu the new definition refen to and possibly updates a
memo-de. A major advantage of Norvig's approach is that programs may,
in some ases, be made mon efficient with no diange to the solvcesode.
definition. In Norvig's approadi, both the process of memoizing a function,
and the procen of updating the memo-table, make use of Common Lisp's
updateable funaion-name space. This predudes direct use of NoMg's
approach when language processors are ro be constructed in a purely-
hinaional programming language where updateable abjects are not
permitted.
Leermaken 1121 and Auguteijn [l] have also dexribed how
memoization can be used to improve the complexity of h a i o n a l topdown
backtracking language processors but have not indicated how the
mernokation process itself would be achieved In put idar , they have not
addressed the question of how memckation would be vhieved in a purely-
hinaional implemenration of the langulge processors .
A funaional programmlig laquage is one in which functions are k-
dass objects and may, for atample, be put in lists, passed to other funmions
as arguments, and retumed by functions as results. A purely-functional
ianguage, nich as Miranda [lq, LML [5], and Haskell [8], is one in whidi
functions provide the o d y control structure and sickffects, nich as
assignment, are not allowed. This restriction is a necessary condition for
referenrial rransparency, a p ropeq of prognms that simplifies reasoning
about them and which is one of the major advantages of the purely-
functional progrvnming style [19].
Owing to the facc that side-effects are forbidden, purely-functional
languages do not accommodate any form of updateable object.
Consequently, N o ~ g ' s technique for improving the effiàency of topdown
backtracking language processon cannot be implemented directly in any
purely-funaional language. However, we can adapt NoMg's approach if we
use a variation of memoization that ha. been describeci by Field and
Harrison [SI and investigated in detail by Khoshnevisan [Il]. This
memoization tedinique differs from conventional approaches in that memo-
tables are vsociated with the inputs to and outpuu from functions, nther
than with the functions themse1ves. A function may be memoized by
modifying its definition to accept a table as part of irs input, to refer to this
table before cornputhg a r d t , and to update the table before r d g it as
part of the output. The memetable k passed as an input to the toplevel ail
of recursively defined functions and is threaded through all recursive calls.
To illustrate this technique, we show how the Fibonacci function can be
merno id W e begin with a textbook definition given in Fig. 24.
f i b O = l fib 1 = 1 f i b n = fib In- il +fa [ n - 21, if n 1 2
Figue 24 Definitioa of the F i b a k hc t ion
Defined in this way, evaluation of the Fibonacci funaion has
exponenual cornplexity. The cause of the exponentid behnrior is the
Appcnduc A: Memoizing My-Funcrioual TopDown B;idrrncking Languagt Pnxcsson
replication of computation in the two recursive calls. This replication can be
avoided by memoiution. We begin by modtfying the d e f ~ t i o n of fib so
thar it acceprs a table as pan of its input and retum a table as pur of its
result. In the modifiecl definition, round brackets and commas are used to
denore tuples. The table tl, which is output fiom the first recursive QU of
ffib, is paswd as input to the second recursive cal1 of tfib. The table tî which
is output from the second recunive call is retumed as r d t from the t o p
level dl of ffib:
t f i b ( O , t) = (1, t) tfib (1, t) = (1, t) tfib (n, t) = (ri +r2, t2)
where (r1, tl) = tfib (n - 1, t) (r2, t2) = tflb (n - 2 , tl)
Note that ffib still hu exponential behavior. When applied to an input,
it renircs the table unchanged. Rather than modifying the definition of tfib
directly to make use of the memetable, as is done in Field and Harrison and
in Khoshnevisan, we choose to abstract the tablolookup and update process
into a general-purpose higherorder b c t i o n memo which we an apply to
tfib to obtain a mernoid version. This variation is comparable to Norvig's
technique. When memo is applied to a fuaction f it renvns a new funaion
newf whose behavior is exady the svne as f except that it refers to, and
possibly updates, the memetable given in the input.
In the definition of the function memo below, the expression mr $pos
1 denotes the first element of the list of memorized results mr. The
dehition of lookup makes use of a lis comprehension, [r 1 (y, r) t t; y = i] ,
which is to be read as "the list of all r such that the pair (y, r) is a member of
the table t and y is equal to the index i."
memo f = newf where newf (i, t) = (rl,tl)
where (rl,tl) = ( n u $pas 1, t) ,if mr -= 13
= (r2,update i r2 t2) ,if mr = []
(r2,t2) = f (i, t) mr = lookup i t
upâate i r t = (i, r): t lookup i t = Ir ((y, r) <- t; y = il
We can now complete the process of memoizing the Fibonau5 function by
applying memo to the two rmvsive cllk in the definition of ffib as shown
in Fig. 25. The result is a funaion calIed mfib which has linear complexity.
F i î5 A rnanoizcd version of the Fibonacci function
Some readers may realize that it is only necessary to store the m o
most-nantly computed values of the Fiboluai secpence in rhe rnemo-uble.
Modifying the h a i o n update accordingly would decrease the space
requirements of mfib but would improve neither M e nor space complexity.
It should &O be noted that there are many other ways to improve the
complexity of the FibonvQ funaion. We do not daim that the use of
memoization is the most appropriate tedinique in this application. M e have
chosen to use the Fibonacci function as an example so that our technique can
be easily compared with that describecl by Norvig who &O wd the
Fibonacci example for expository pwposes.
The technique described above is not as elegant as Norvig's in the
sense chat the process of memoization has r d t e d in changes to the
deGniUon of the Fibonacci funaion at the source- code level. Later we show
how ro reduce the number of dianges required for memoization and limit
them to local changes only.
A memoized functiod recognizer is a funaion that taces, as an extra
parameter, a memo-table containhg all previously cornputecl r d t s . One
approach to memoization is to modify the &tions of the functions terni,
$orelse, and $then, so that the recognizers built with them accept a memo-
table as part of their input and retum a memo-table as part of their output.
Next, a hi&er~rder furiction mernoize is applied to each rec0gn;Z.r to
create a memoized version of it.
4.1 The memo-table
In order to improve efficiency we have chosen to store the input
sequence of tokens in the memo-table and to represent the points at which a
recognizer is to begîn processing by a List of numben which are med as
indexes into that sequence.
The memo-table is s t rumed as a list of triples of length n + 1, where
n is the length of the input sequence:
merno-table == [ (nurn, token, [ (rec-name, [numl ) 1 ) 1 rec-name == [char]
The l u t element of the memo-table is a special token # representhg the end
of the input. The 6rst component of the ith triple is an integer i. This
number acts as an index into merno-table entries. The second component is
the ith token in the input sequence. The third component is a lisc of pain
representing all succssfd recognitions of the input sequeno starhg at
position i. The £irst component of each pair is a recognizer name, the second
component is a list of integer numbers. The presence of a number j, where
i I j 5 n + 1 in this lia indicates that the recognizer sucaeded when applied
to the input xquence beguintng at position i and finishg at position j - 1.
Initially, the third componenr of each triple in the memo-table is an
empy list. The following example shows the initial table corresponding to
the input "aaa".
Two operations are r+ed for table lookup and update. The
operation I O O ~ U P applied to an index i, a recognizer name name and a
memoiable t retums a list of previously computed end positions where the
recognUer name s u d e c l in processing the inpur beginning at position i.
The operation update applied to an index i, a r d r res and a memo-table t,
returns a aew merno-table with the ïth entry updated. A result is a pair
consihg of a recognizer name and a lis of succesid end-positions. Update
ad& the result res to the list of s u d recognitions corresponding to the
ith token.
lookup i name t = [1 , if i > #t = [bs ( (x, bs) c- third (t $pos il; x = n a m e ] , otherwise where third (x, y , z ) = z
update i res t = map (add-res i res) t
where add-res i res (x, term, res -1ist)
= (x, tem, res:res -list), x = i = (x, term, res-list) , otherwise
The hinction mernoize takes as input a recognkr aame n, a
recogxïizer f, a lisr of positions where the recognizcr should begm processing
the input, md a memotable. For eadi start position in the list, the funaion
memoize fk c& the function lookup to detexmine if this application of
the recognizer hu been computed previously. If 100kup retums a n empty
lia, the recognizer is applied, a new r d t is caldatecl and the funaion I
update is used to add the renilt to the memetable. Orherwise the
previously computed r d is remmeci. Results renvned for each of the sur
positions are merged with the removal of duplicares.
memoize n f (Cl, t) = ( , t) memoize n f (b: bs, t)
= (mergeres rl rs, trs) where (ri, tl) = (IIW $pas 1, t) , if m r -= [J
= (r2, update b (n, r2) t2) , otherwise (r2, t2) = f (Ebl, t) (TS , trs) = memoize n f (bs, tl) mr = lookup b n t
4.2 The memoized recognizas
The definitions of term, $then, and $orelse given in section 1 are
modifiecl to take as input a lis of positions where the r e c o e r should
b e p processing the input, and a memetable. Owing to the fact that the
entire input sequena of tokens is reprrsented in the memo-table, there is no
need for the recognivrs to explicitly retum unprocessed segments of the
input. Instead they retum a number as index into the input sequence.
The next modification to the definitions of $orelse and $then is to
aUow thrading of the memo-table through recursive calls. The funaion
tum is modified owing to the &a that the input sequence is now stored in
the memo-table. The funaion merge is used to combine and remove
duplicpes that arise if the same segment of the input can be r e c o g d in
more than one way by a recognizer. For recognition purposes such
duplicares can be considered equal.
mterm c (bs,t) = ((concat . (map test-for-cl)) bs, t) where test-for-cl b = [ l , if b > #t test-for-ci b = [I , if second (t $pos b) -= c test-for-cl b = [b + i] , if second (t $pos b) = c
second ( x , y, z ) = y
(p Sm-orelse q) (bs, t) = (merge-res rp rq, tq) where (rp, tp) = p (bs, t) (rq, tq) = q (bs, tp)
These fundions can now be used to improve the complexity of
functional recognirers whikt presening their n r u d simplicity and
modularity. As example, Figure 26 shows the relationship becween the
original recogaizer for the gnmmv s ::= 'a' s s 1 empty and the memoizcd
version. Note that ir is not necessuy to diange the definition of empty nor is
it n e c e u y to mernoLe the recognizers consvucxed with rnterm.
The original recagnizer The memoized version
s = (a Sthen a S t h u i a ) ma = cneumize -man ( (ms Sm -then ms SmOLOthen ms) Sotelse empty Sm-orelse empty)
a = term ' a 8 ma - mtem ' a t . Figure 26 The rrl;uionship betwem a rrcognizer and 'u
manoizcd vcnion
4.3 The Algorithm
We begin our description of the algorithm by presenting an example.
Suppose thar the string 'aa* is to be p r o 4 using the memoized
recognLer ms dehed in Figure 26. The initial input is as follows, where the
second component of the tuple is the initial mem~able:
OwLig to the fict that no resuIts have been computed yet and that ms
is an alternation ($m-orelse) of two recogaUrfi, the fim alternative of ms
is applied to the initial input. This recognizer is itself a sequence of the
recognizers ma and ms $then ms, therefore the i k t of this sequence, i.e.
ma, is applied ro the initial input. The recognizer ma succeeds in
recognizîng an 'a' and rmims a r d t consisting of a pair with first element
[2], indicating that the first element of the rquence of tokens hu been
coILSUrned, and the memetable unchariged (because basic recognizen do not
update the m e m d l e ) . The evalulrion tree at this point is as follows, where
? indiates values yet to be computed. Seqyen* is denoted by continuous
lines and alternation by broken lines.
ms I l l => [?l
Nm, ms $then ms is applied to th r d t . The applicuion of the
first ms in this sequene results in a similar computation to the initial
application except that the starting position is [2]. The same holds when rns
is applied to position [3). The rhird element of the input memerable
corresponds to the end-of-input. The recognizer ma applied at position [3]
fa&, retuming an empty List, and thus ma ms rns f a . . The recognizer
mempty applied at the same position renuns as r 4 t a tuple whose k
element is the List [3]. Now the results of both alternatives of the recognùer
ms have been detefmined and the value of ms applied at position [3] is
computd The foUowing shows the evaluation uee when al1 values up to
ms [2] have been c o m p u d
ms ms [21 => [?l
ma ms m s [2] => [3] mempty [2] => (21
Note tbat when the recognizer ms is applied to the posirion [3] for the
second t h e , the correspondhg r d t is simply copied from the memo-table.
When a recognizer is applied to a list that contains more rhan one
element, the result is obtained by applying the recognizer to each element in
the lisc and merging the r d = . This is illustratecl below:
The final result is:
The following is a more formal description of the algorithm:
1. Input:
a A context-free7 non-left-rrcursive, gnmmar with productions and
reminais represented uskg functions mtem, $mathen, Sm-orelse,
and mempty. The stan symbol for the grammar is the name of the
fim recognizer ro be applied.
b. A pair whose first component is the lisr [Il, and whose second
component is a memo-table corresponding to the input sequence of
tokens.
2. Output:
a A pair whose first component is a ILr of positions where the recognition
process of the input sequence of tokens (srarting from the fim token)
was successfully completed. The second component is the final n u e of
the memo-table.
a. At each sep we apply a recognizer to a lin of start positions and a
memetable:
e If the lin is empty, the resuit is an empty list and the unchangecl
memo-table.
Otherwise, we first apply the recognkr to the element of the
lin and the memo-table. The r d t is a lirr r i and a possibly
modified memetable 11. Then we apply the recognizer to the rest
of the list and the m e m d l e t l . The result of this application is a
list R and a munosable t2. The final r d t is a pair: a Iisr obtained
by merglig r l and R, and the table t2.
b. Application of a recoSn;es m at a position j begins by reference to the
current memo-table:
If the jth row of the memo-table contains a result corresponding to
m, this result is retumed.
Ocherwise a new r d t is computed, the memo-table is updateci
and the result retumed,
c. Each recognizcr can be either the basic recognizer rnempty, a basic
recognizer construaed using mten, or it can be a combination
constnicted kom two or more componerirs using $mathen or
$ m-O relse.
R d t s for basic recognizea are obtained immediately by applying
the corresponding function.
For sequenus or alternations, the results of the components are
computed k and then combined to obtain the final r d t .
5 CompIexity analysis
We now show that memoized recognirers have worst-case thne
complexity of 0(n3) compared to exponential behavior of the unmernoized
form. The analysis is concerned only with the variation of rime with the
length of the input lîst of tokens. Although a grvnmv could be very
cornplex, its size wili always be independent of the length of the input.
5.1 Elementary operations
We amune that the followiq operations require a constant amount of
time:
1. Testing if two values are equal, less than, etc.
2. Exvacting the value of a component of a mple.
3. Adding an element to the front of a iist.
4. Obraining the value of the ith element of a list whose length depends
upon the size of the grammar but not on the s U e of the input LX.
5.2 The size of the memo-table
The memo-table is strucnrred as a l is of (n + 1) tuples, where n is the
length of the input sequence of tokens. T h e £irst component of eadi tuple is
an integer mging from 1 to n + 1. The second component of a tuple whose
component is i, is the ich token in the input. The third component is a
lin of pairs (recognizename, result). Owing to die f a that the gr;immar
is fixed, the number of recognizus, denoted by r, is constant. Therefore, for
each tuple in the memmble, the le& of the lis of pain is s r.
The second component of d pair is a list of positions represented by
integers where the corresponding recogniw s u d in complethg the
recognition of a segment of the input. The length of the lisu that correspond
to the ith tuple is at most (n - i + 2) owing to the faa that a recognizer
applied to input at position i may succeed a any position j, i s j s r i + 1.
5.3 Memetable lookup and update
The function lookup applied to an index i, a recognizer name, and a
memetable, fim searches the memo-table to access the ith element, then it
searcfies the lis of r d t s in the irh tuple to access the element that
corresponds to the given recognizer name. The function IOO~UP requires
O(n) Ume.
The function update applied to an index i, a result res, and a mexno-
table, retums a new memetable wirh the ith tuple updated. The r d res is
added in front of the list of s u c d recognitions corresponding to the ith
token. The huicrion updare r+es O(n) Ume.
5.4 Basic rccognizm
Application of the recognizer rnempty simply creates a pointer to the
input. This takes constant rime. Application of a recognizer mterm a to a
single sran position i, requires the ith entry in die memo-table to be
examineci to see if the irh token is equal to a. If there is a match then the
r d i + 1 is added to the lis of resuiu returned by mterm .Othemïse the
recognizer fails. This operation is O@).
Note that w e are only considering, here and in the nacc two sub
sections, the Ume required to apply a recognkr to a single position in the
input liste We consider application of a recognizer to a more7han~ne
element list iater.
Asmming that the results p i and q i have been cornputed, application
of a memoized recognhr (p $m-orelse q) to a single start position [il,
involves the following seps: .
.r one memo-table lookup - O(n) - and, if the recognizer has not been applied before:
* merging of two r d t b, each of which is in the wont case of
1engt.h n + 1, - 0(n),
0 one memo table update - O(n).
Asnune that p [il has already been calculated. In the worst case the
resukis&elist[i, i + 1 , . . , n + 11.Assumealsothatq [i, i + 1 , . . , n + 11
has already been caldateci. Now, application of a memoized recognUrr p
$m-then q to a single start position i involves the following steps:
one memo-table lookup (O(n)) and if the lookup fa&,
compurWon of the result plus
one mem~able update (O(n)).
5.7 Maging r d t s returncd whm a recogniza is applied to a list of
start positions
The funaion merge is aLo wd to combine the r d t s renuned by a
single memoized recognkr when applied to a list of start positions wirh
96
more than one entry. (See the definition of memoize). Suppose a recognizer
f is applied to a k~lement lin of start positions [1 , . . ,k]. The corresponding
evduarion tree is as foilows:
f Ekl
Assuming that the results of f [il and f [i + 1, . . ,k] have already been
computed, computation of f [i. i + 1. . . ,k] res;'es one memoiable lookup
(O(n)) and one merge, of FWO liSU whidl are in the worst case of length n +
1. The total time is O(n). Note 2LO rhaf application of a mognizer f to a k-
element lis of s t a t positions, r d t s in an exmirion tree with 2*k + 1 nodes
representing applications of the recognizer f.
The analysis so far cm be summarized in terms of mecution trees
(sudi as those shown eulier). Each non-leaf node of an execution tree
corresponds to an application of a recognkr to a lis of start positions, or to
an application of m-orelse or m-then. Leahodes correspond either to an
97
application of mernpty, or mten a for sorne a, or to a computation that
has been performed before and nord in the memetable:
Lemma 1: We have shown that the resuit corresponding to mernpty,
mtem a for some a, and 100kup cui be cornputeci in O(n) t h e .
L~XIIIM 2: We have also shown that r d t s corresponding to non-leaf
nodes can be compured in O(n) UM providecl that the values of their
children are avaiiable.
5.9 Proof of 0(n3) time complexity
Theorem
Given an arbitrary context-free non-lefc-recursive grarnrnar G, the
corresponding mernoid fuactional recopkr requVes 0(n3) time to
process an input sequence of length n. If the grammar is not ambiguous, the
time complexity is O(n3.
Let f,, f , . . ,f, be a set of recognizers corrcsponding to the gnmmar O,
and let f, correspond to the start symbol in the grammar. We begin by
applying the recognizer f, to the lisr [Il. This application yielb an exmition
tree similar to the ones shown eulier. We will show that for an arbitrary
grammar the number of nodes in sudi a tree is O(n4 and if the gnmmv is
not ambiguous this number reduces to O(n). Owing to the fact diat the rime
r+ed to perform cornputuions at euh node is linear in the length of the
input seqyence (Lemma L and Lemma 4, this condudes the proof of the
theorem.
For simplicity assume that each recognizer is either mempty, mterm a
for some a, or is of the form (p $m-orelse q), or (p $m-then q) for some p
and q. In practice recognizen c m be a combination ($m-orelse andior
$m-then) of more than cwo recognizen, but the number WU always be
bounded by the s h of the grvnmar and will be independent of the length of
the input SeQuence of tokens.
Suppose that the recognizer f, is of the fonn (4 $m-orelse f) for some
2 s i, j s r. Suppose &O that the recognizer f, is of the form (f, $m-then f,)
for some 2 k, p 6 r and k # i, p # i. The corresponding tree in the wom
case is as follows:
f, [ I l
Consider the expansion of those subuees that correspond to an
application of a recognizer to the one-element lia of start positions [Il.
Owing to the fact that the gnmmv is non-1eft-recursive and that it consisu
of r recognizers, after a maximum number of seps in each path, which
depends o d y on the size of the grammar, there must be an application of a
recognUcr that consumes some input. It follows thar the total number of
99
applications of a recognizer to a one-element lin is independent of the length
of the input. For the same reason, the total number of applications of a
recognizer to a more than one-element lis (in the wom case an (n + 1)
element lin) is independent of the length of the input. d
When the £irst step is completed, there will be only O@) subtrees to be
furcher expanded. This is because the result corresponding to a pair
(recognizer, start position) is d d a t e d only once. If the same recognkr is
applied to the same start position a+, the comsponding r d t is sixnply
copied from the memetable. At the next stage, the same procedure is
r e p t e d for each recognizer that is applied to the lin [2] . The only
difference is that now O(r) subtrees mus be expanded not just one. Only
O(r) nodes will 'be generated for each recog-r applied to the lin [Z], and
O@) nodes for each application of a recognizer to a morethan-oneelement
list.
At the ith sep, there WU be O(r) nodes correspondhg to an
application of a rocognizer to the lisr [ij, and O(r) nodcs correspondhg to an
application of a recognizer to a more-thanaxdement list (in the wom case,
an (n - i + 2) element h). The total number of steps is n + 1. Owing to the
fact thu an application of a reagnizer to a kelement list yields a tree that
contains 2 ' k + 1 nodes, as dtwsed in subsection 5.7, the total number of
nodes is given by the following, where c is proportional to the number of
If the grammar is not ambiguous then each input sequence of tokens
can be r e c o g d in jus one way. Therefore, euh recognizer applied at
some position i will renirn at mosî a ondement-lia as result. The
comsponding formula for unambiguous gammars given below condudes
the proof.
6 A monadic approach to incorporate memoization
So far, we have used an ad hoc method to redehe die recognizer
functions in order to incorporate memoization. This method is susceptible
to error. In fact, an earlier version of the paper contained an insidious error:
the fundon m-then was defined ts foliows:
Accordkg to thL definition, if the recognizer p fails, then the memo
table is rrnirned unchanged in the result (0, 1). This error would r d t in
exponential mmplexity for certain gnmmvs when applied to certain inputs
which fail ro be recognuod In this seaion, we show how the recognizer
definitions an be modifiecl in a s t d way which reduces the possibility
of such errors. The merhod treats memoization as a specific instance of the
more generai notion of adding featttres to purely functional programs.
6.1 Monads
Monads were introduced to cornputhg science by Moggi [15] who
noticed that reasoning about programs dut involve handling of the state,
exceptions, VO, or nondeterminism can be simplifieci, if these fanires are
expressad using monadr. Inspirecl by Moggi's ideas, Wadler [21] proposed
monads as a way of structuring functional progruns. The main idea behind
monads is to &guish b e e n the type of values and the type of
computations that deliver these values. A monad is a triple (M, unit, bind)
where M is a type consuuctor, and unit and bind are cwo polymorphic
functions. M can be rhought of as a funaion on types, that maps the rype of
vdues into the type of computations produckg these values. unit is a
function that takes a value and rrnuns a corresponding cornpudon; the
type of unit is a -> M a. The function bind represenu sequencing of m o
computations where the d u e retumed by the hm cornpumion is made
available to the second [and possibly subsequent] cornputuion. The type of
bind is
The identiry monad [21] below represents cornpuotions as the dues
they deliver.
unitl :: -> id unitl x = x
The state monad (also dehed in [ZID is an abstraction over
computations that deal with the aate. The definition is given below.
unit2 :: * O - > stm * unit2 a = f
where f t = (a, t)
bind2 :: stm ** -> (* -> stm **) -> s t m ** (m Sbind2 k) = f
where f x = (b, z)
where (b, z ) = k a y
where (a, y ) = m x
We will use the identity and the state monad to constnicr non-
memoized and memoted m o d c recognizers respectively. In the
description below, we d e r to a third monad which we use as an d o g y in
the wnstruceion of our m o d c recognkrs. This is the manad for lists 121 3.
Owing to the f ict rhu our recognivrs cari be applied to a list of inputs, it is
necessvy to have a well strtlctwed way of doing tha.
list == [*]
unit :: -> list *O
unit a = [al
bind : : list -> (* - > list **) - > list ** II Sbind y = 11 (a:x) $bind y = ( y a) ++ (x $bind y )
6.2 Non-memoized monadic recognllers
ki order to use rnonads to provide a structured method for adding new
effecu ro a funaional program, we begin by idenufying all h a i o n s that
will be involvecl in those effects. We then replace those functions, which can
be of any type a -> b, by functions of type a -> M b. In effm, we change the
program so that selected funaion appiicaxions renvn a computation on a
value rather than the value iuelf. This computation may be used to add
features such as sate to the program. In order to effect this change, we use
the fundion unit to convert values into computations that r e m che value
but do not contribute to the new effects, and the function bind is used to
apply a hinaion of type a -> M b to a computation of type M a. Having
made these changes, the ori@ progrvn can be obtahed by using the
identity monad idm, as shown below. In order to add new effects such as
state, or exceptions, we simply change the monad and make minimal local
changes as rqired to the rest of the program. In the following subsection,
we show how to add the new effect of memoiution by replacing the
identity monad with the state monad strn, and making some local changes.
The non-munoized recognizers introduad earlier in this paper were
functions caEiag a list of input sequences of tokens and returning a sirnilar
Lst of sequences yet to be processed. The definition of the non-memoized
m o d c recognUcrs differs slighrly in thar the list of inputs is represented by
a pW: a list of nan positions and the whole input sequence of tokens.
Owing to the fact that the input sequence remains unchanged during the
execution of the prognm, rhere is no aeed for any recognizer to r e m it.
In order to constnict non-mernoid monadic recognizers, we start by
defiriing the type of non-memoized recognizers. We define ic using the type
constnictor idm of the identity monad.
That is, a r e c o e r of type a is a hinction that applied to an input string
and a lisc of starc positions renirns an 'identity" computation of type a. We
can now d e h e die function t e - 1 , which when applied to a c h m e r ,
reniros a function that is always applied to a oneelement list of positions.
te-1 : : char - > rec ~ n m l term-1 c s [x] = unitl [J , if [x > #SI \/ (s! (x-1) -= cl
P unit1 Ex+l], otherwise
In d o g y with the bind opemor for the list monad, the h a i o n
terni, which when applied to a character renvns a recognizer that can be
applied to more than oncelement list of start positions, is defined as follows.
term :: char - > rec m nu ml term c s 11 = unitl 11 term c s [x:xs] = term-1 c s [xl $ b i n a f
where f a = term c s xs $bina g where g b = d t i (merge-res a b)
The definitions of orelse, then, and empty are given below. Note that
we have replacecl the append operator ++ with the function merge-res that
combines rwo Lisrs removing duplicates.
orelse : : rec [numl - > xec [numl -> rec [nml (p $orelse q) s input = p s input Sbindl f
where f a = q s input Sbindl g
where g b = unitl (merge -res a b)
then : : rec [numl -> rec [num] - > rec [nm] (p $then q) s input = p s input $bindl f
where f a = unit1 I l , if a = = q s a , i f a - = I I
ernpty : : rec [numl empty s x = unitl x
Notice that we have not rewritten the application of merge-res using
bindl. The reason for this is that we know that in rhis application,
rnerge-res will not be involved in memoiution and therefore the result of
its application can be viewed as a value rather than a computation.
6.3 Memoizcd monadic recognizas
We now consider the state monad as given d e r and define the two
operations on the state: lookup and update. The type of the state is [num,
[([char], [numlll*
lookup : : num -> [char] -> stm [ [numl 1 lwkup ind n a m e st = ( [ ] , st) , if ind > #st = ( [bs 1 ( x , bs) <- (snd ( s t ! (ind-1) ) ) ;x=name], st) , otherwise
update : : num - > [char] - > [numl - > stm [J update ind name val st
= (undef, map (updatemtentry ind name val) st)
update mt-entry ind name val (x, l ist l ) = (x, ( n a m e , val) : list) , if x = ind = (x, list) , otherwise
We define the type of the memoized recognizers in terms of the type
constructor of the state monad strn:
rec * == [char] -> [numl -> stm +
and define the function memoize (there is an anaiogy to the definition of
terni and bind for the list monad here).
memoize : : [char] - > mec [numl - > mec [numl memoize n a m e f s [] = unit2 [l memize name f s ( x : x s ) = memoizei name f s [XI Sbind2 g
where g a = memoize name f s xs $bhd2 h
where h b = unit2 (merge-res a b)
memoizei :: [char] -> mec [num] -> mec [numl memoizel name f s [il 5 lookUp i name Sb- g
where g a = unit2 (a!O) , if a -= 11
= f s [il Sbind2 h, otherwise where h b = update i name b $bUid2 r
w h e r e r any = unit2 b
The definitions of term, orelse, and then remain unchangecl except
that unit1 and bindi are replaad by unit2 and bind2 respectively. The
memo-table is compldy hidden in the definition of terni, orelse, and then.
One of the advantages t that having identifiecl all recognizcr functions as
being involved in the mernokation effecr, the monadic form of orelse is
araightforward and thereby this approach reduces the diance of making the
kind of error referred to at the beginnlig of this section. The definition of
monadic memoized recognizen is exactly the same as with the original
memoized recopkers, and the complexity analysis presented earlier holds
also for memoized monadic recognizers.
The followiq table shows the results obtained when an unxnemoized
monadic recognizer for the ginmmv s ::= a s s ( ernpty and a memoized
monadic version were applied to inpucj of various length. The results
suggests that the recognizea h o have O(n3 space complexity as weu as
O(n3 Ume complexity.
iuimemoized I memoized
1 out of spaa 1 155,526
More information on the use of monads to structure functiod
language processon can be found in Wadler [20,21,22].
7 Memoizing parsers and syntax-directed evaluators
The memoization technique presented in this paper could be readily
exrendecl so that the memetables conmin parse tables similar ro those
created by Earley's dgorithm 133, or the more compact representation of
factord syntax trees suggested by Leiss [13]. However, to do so would not
be in keeping with an approach that K commonly used by the ~urely-
functional prognmming community in building language processors. That
approach is to avoid the explicit co~lsuuction of syntax trees unless the mees
are specifically required to be displayed as part of the output. Instead of
comtructing labeled syntax trees which are subsequently evaluated, an
alternative approach is wd: semantic actions are dosely associateci with the
executable grammar productions SQ rhat semantic attributes are computed
d i r d y without the need for the urpliat representation of syntax trecj.
Userikhed types an be introduced to accommodzte different types of
attributes as has been done in the W/AGE attribute grvnmar programming
Ianguage m. This approach is viable owing to the kywduation Stntegy
employed by most purely-functional languages. The memoiution technique
described above cm be uKd to improve the efficiency of such syntaxdirected
evauators with two minor modifications:
1. The definition of m-orelse is changed so that the function merge
is mplaced with a function that removes results that are regarded as
dupliates under applicationdependent cnteria which may be les
inclusive than the criterion used for recognizen. Resuits that are
remrned by a recognizer are regarded as duplicates if rhey have the
same end points. For recognition purposes the end points are all
that is required to be maintained in the memetable. With syntax-
directeci evaiuarors, the end points may be augmented widi
semantic values. A single end point pair may have more &an one
value associatecl with it. In some cws synractic unbiguicy may
r d t in semantic ambiguity. Results retumed by a Ianguage
processor would only be tegarded as king duplicates if they have
the same end points and have @valent semantic attributes. The
funcrion merge would be replwd by an applicationdependent
funaion that identifies and removes such duplication. In this
approach, if syntax trees are required as part of the output, they are
simply treated as another attribute. In such cases syntactic
ambiguity is Wmorphic with semantic ambiguity and the funaion
merge would be replaced by conaenation in the definition of
2. The memo-tables and the update and lookup funclions are
m&ed according to the attributs that are required in the
application.
One advantage that derives fiom this approach is that ail unnecessvy
computation is avoided. Memoultion prevuits language procesfors from
reprocessing segments of the input already Wited and the use of merge, or
an applicationdependent version of it, removes duplication in mb-
components of the r d c as soon as it is possible to detect it. It should be
noted that the complexity of laquage processon constructed in thk way is
application dependent. If syntax trees are required to be represented in full,
the language processor may have exponential compleriry in the wom case
owing to the fact that the number of syntax uees un be exponential in the
length of the input for highly ambiguous grammars. A compact
representation of the trees could be produced in polynomial rime and the
trees could then be ~assed on to an evaluator. Howwer, rhis would detract
from the modularity of the language processor and wodd provide no benefit
if the trees were to be subsequently displayed or otherwise processed
sepantely as this couid be an exponential proass.
This paper was inspireci by NoMg's dernomration that memoization
can be implemented at the sou~ce-code level in luiguages such as Common
LLp to improve efficiency of simple language proasson without
compromting their simplicity. We have shown that NOM& technique can
be adapted for use in purely-funaional prognmming languages that do not
admit any form of updateable object. The technique desaibed in this paper
can be thought of as complementing that of N o ~ g ' s in t h t it enables
memoization to be used to improve the effiaency of highly-modular
language processors constructeci in purely-functiod languages.
This applicarion has &O i l l u s t d how mon& can be used to
structure functiond programs in order to avoid errors when modif~cations
such as the addition of state are made. We are now exploring the use of
monab in the memoization of prognms that are comruaed as executable
9 Acknowledgments
We would like to rhank Young Gil Park, Dimitris Phoukas, and Yung
H. Tsin of the University of Windsor for helpful discussions, and to the
anonymous r e fe ra for carefd reading of the paper, for many useful
suggestions, and for idendying the error in the code that was d i d in
section 6. Richard Frost also adrnowledges the support received from the
10 References
111 Augusteijn, L. (1993) Fundonal Programming, -am Tranüfoormatzons and Compiler Comtndon. Philips Research Labo rat ories. ISBN 90-74445-04-7.
121 Burge, W. H. (1975) R e m i v e Progamming Techniques. Addison-Wesley Publïshing Company, Reading, Massachusetts.
131 Eadey, J. (1970) An efficient contez-free parsing dgorirhm- Commun. ACM 13 (2) 94-102.
[SI Field, A. J. and Harrison, P. G. (1988) F u n c t i d Plogramming. Addison-Wesley Publishing Company, Reading, Massachusetts.
161 Frost, R. A. (1992) Constnicçing programs as executable attribute grammars. The Computmjoumal35 (4) 376-389.
Frost, R. A. (1995) W/AGE The Windsor Attribute Grammar Programming Environmem. SchIoss Dagnuhl Intenrrrtional Wo~kshop on Functionai pogramming in the Red World.
Hudak, P., Wadler, P., Arvind, Boutel, B., Fairbairn, J., Fasel, J., Hammond, K., Hughes, J., Johnsson, T., JSieburtz, D., Nikhil, R., Peyton Jones, S., Reeve, M., Wise, D. and Young, J. (1992) Report on the programming language Haskell, a non-strict, purely funcrional language, Version 1.2 A CM SIGPLAN Notices 27 (5).
Hughes, R. J. M. (1985) Lazy memo functions. In proceedings. Confwence m Frunctionul Progrmming und Compter Architecture Nancy, France, September 1985. Springer-Verlag Lecture Note Series 201, editon G. Goos and J. Hucmanis, 129 - 146. 4
Hutton, G. (1992) Higherorder functions for paning. Journal of Funmmunal Programming 2 (3).
Khoshnevisan, H. (1990) Effcient memo-table management arategies. Acta 1nfoll l~l~ti.u 28,43-8 1.
Leerrnakers, R. (1993) fKe Functimi T~eatment of Parszng. Kluwer Academic Publishen, ISBN 0-7923-9374-7.
Leiss, H. (1990) O n Kilbury's modification of Earley's algorithm. ACM TOPLAS 12 [4] 610-&W.
Michie, D. (1968) 'Memo' functions and machine leaming. Nature 21 8 19 - 22.
Moggi, E. (1989) Computational larnbdasalculus and monads IEEE Sympokm on Logic in Compter Science, Asilomar, California, June 1989, 14-23.
Norvig, P. (1991) Techniques for automatic memoisation with apphcations to contact-free parsing. Computrtiod Linguistics 17 (1) 91 - 98.
Turner, D. (1985) A l a y funcrional progrunming language with polymorphic types. hoc. IRP In+ Conf: m F u n c t i o ~ l Progrdmmimg Languages and Compter Architearrw. Nancy, Fmce. Springer Verlag Lecture Notes in Cornputer Science 201.
1181 Wadler, P. (1985) H o w to replace fadure by a List of successes, in P. Jouannaud [ed.] Functiod Progrumming Langwges and Comptrter Architectures Lecture Notes in Cornputer Science 201, Sp~ger-Verlag, Heidelberg, 1 13.
WI Wadler, P. (editor) (1989) Special issue on lazy hinctional programming. n e Computer/ouml32 (2).
Cm Wadler, P. (1990) Comprehending monab. A W SIGPLA N/SIGA CT /SIGAR T Sjmposium on Lisp and FzinctîoruI Programrnzng, Nice, France, June 1990, 61-78.
Lw Wadler, P. (1992) Monads for functiond prognmming. Marktoberdorf Summer School on Program Design Calculi. Springer Vehg Lecture Nota in Compter SCiet~~e.
w] Wader, P. (1995) Mon& for functiond progamming, Proceeding of the Banad Spring School on Advanoed Furnional Prognmming, ed J. JNing and E. Meijer. Springer Verkzg Lecture Notes in Computpr SEMtce 925.
A p p e n d i x B
IMPLEMENTATION OF MONADIC LANGUAGE PROCESSORS USING TYPE CONSTRUCTOR
CLASSES IN GOFER
class Monad m where unit :: a - > m a bind :: m a -> (a -> m b) -> m b
class Monad m => MonadPlusZero m w h e r e plus :: m a -> m a -> m a zero :: m a
class MonadPlusZero m => Recognizer m a w h e r e te* :: Char -> a -> m a emptyR :: a - > m a orelseR :: (a -> m a) - > (a - > m a) -> (a -> rn a) th& :: (a - > m a) -> (a -> m a) -> (a -> m a) failR : : m a
emptyR = u n i t (p 'th- q) inp = p inp *bind' \xl ->
q xl 'bind' \x2 -> unit x2
(p 'orelseR' q) inp = (p inp) 'plus' (q inp) failR = zero
class MonadPlusZero m => P a r s e r m a where term :: Char -> m a empty :: a -> m a orelse :: m a -> m a -> m a f a i l :: m a
empty = unit orelse = plus f a i l = zero
A@& 8: Imp1aneat;ition of Mo& h n p g e Processors Using Type Co-or Classes
class Reccgnizer m a => MRecognizer m a where memoizeR :: String -> (a -> m a) -> a -> m a
class Parser rn a => MParser m a where memoize :: String -> m a -s m a
- - LIST MONAD type List a = [a]
in unitls, b i n a s , termEïnt
unitls : : a -> [a] unitLs a = [a]
bindLs :: [a] -> (a - > [b]) -> D l x 'bind~s' k = case x of
LI - > C I (a:xl-2 (k a) 'merge-res' (x 'binas' k)
instance Monad [] where unit = u n i t L s bind = bindLs
instance Monad~ïusZero [] where zero = ( 3 plus = mergeres
merge-res :: [a] -> [a] - > Cal merge-res x [I 2: x merge-res [l y = y merge-res (x:xs) (y: ys)
I X C C Y = x:merge-res xs ( y : y s ) 1 Y C C X = y:merge-res ( x : w ) YS [ otherwise = x:merge-res xs ys
- - To avoid problems with the class Ord defined in the -- standard prelude primitive (cc) nprimGenericLtn :: a -> a -> Bo01
-- STATE MONAD type St s a = s - > (a, SI
in unitst, bindSt, lookupst, updatest, newSt , parsest, memizeEv, parseStEv
AppndS B: Implaripmrion of Mo& laquage Pmccrron Using Type Commuaor CLrrer
bindSt :: St s a -> (a - > St s b) -> St s b (a 'bindSt' k) t = k va ta
where (va, ta) = a t
instance Monad (St s) where unit = unitSt bind = bindçt
type State v = [(lnt, [(String, v) ] ) ] in unitSt, bindst, lookupst, updatest,
newSt, parsest, memoizeEv, parseStEv
lookupst : : Int - > String - i St (State v) [v] lookupst ind name t
1 ind > length t = ( [ l , t) 1 otherwise =([bsl (x,bs)<-(snd(t!!(ind-l))),x=~mel,t)
up&teSt :: Int -> String -> v -> St (State v) 0 upâatest ind name val st
= ( 0 , map (update-mt-entry ind name val) st) where update-mt-entry ind name val (x, Mot)
1 x = = ind = (x, (name, val) : list) 1 otherwise = (x, Est)
- - EXCEPTION MONAD data Ex a = Fail 1 Ok a
bindEx :: Ex a -> (a -> Ex b) -> Ex b (Ok a) 'bindgx' k = k a Fail 'bindEx' k = Fail
instance Monad Ex where unit = unit- bind = bindEx
instance MonadPlwZero Ex where plus = plusEx zero = zeroEx
-- PARAMETRIZED STATE READER MONAD type StEZM m s a = s -> m a
in unitStRM, liftStRM, bFndStRM, zeroStRM, plusStRM, termInt, parselnt, termEInt, parseEInt, memoizeRec, parsest, mernoizeEv, parsest-
liftStRM : : Monad m => m a -> StRM m s a liftStRM x t = x
unitStRM :: Monad m => a -> StRM m s a unitStRM x = liftStRM (unit x)
bindstRM :: Monad m => StRM m s a -> (a -> S t W m s b) - > StRM m s b
(a 'bindStRM' k) t = a t 'bind' \va -> k va t
zeroStRM :: MonadPlusZero m => StRM m s a ZeroStRM = \s -> zero plusStRM :: MonadPlusZero m
=> StRM rn s a -> StRM m s a -> StRM m s a (x 'plusStRM' y) s = x s 'plus' y s
instance Monad m => on ad (StRM m s) where unit = unitStRM bind = bindStRM
instance MonadPlusZero m => MonadPlusZero (StRM m s) where
zero = zeroStRM plus = plusStRM
Appn<lu B: Implemmmuon of Mo& fuiguîge Proceson Using Type Comtruaor Classes
- - PARAMETRIZED LIST MONAD type L i s t M m a = rn [a]
in u n i t L s M , b i n d L s M , plusLsM, z e r o L s M , l i f t L s M , m e m o i z e R e c , parsest, memoizehr, parsest-
u r i i t L s M :: M o n a d m => a - > L i s t M m a u n i t L s M x = l i f t L s M (unit x)
b i n d L s M : : M o n a d m => L i s t M rn a -> (a - > L i s t M m b) -> L i s t M m b
x ' b i n d L s M ' k = x 'bind' \XI - > fol& p l u s L s M z e r o L s M (map k x l )
l i f t L s M :: M o n a d rn => m a -> ListM m a l i f t L s M x = x 'bhd' \xl - >
un i t [xi]
instance M o n a d rn => M o n a d (ListM m) where unit = u n i t L s M bind = bindLsM
instance Monad m => M o n a d P l u s Z e x o ( L i s t M m) where zero = z e r o L s M plus = p l u s L s M
z e r o L s M : : M o n a d m => L i s t 2 4 m a z e r o L s M = uni t 11
p l u s ~ s ~ :: Monad m => L i s t M m a -> L i s t M m a -> ListM rn a
(X ' ~ ~ U S L S M ' y ) = x 'bhd' \xi -> y 'bind' \x2 - > u n i t ( x l 'merge-res' x2)
- - PARAMETRIZED INPUT MONAD type I n p M m s a = s -> m (a, s)
in unitInpM, b i n W p M , l i f t S n p M , z e r o I n p M , p l w I n p M , termEInt, parseEInt, memoizeEv, parseStEv
u n i t I n p M :: Monad m => a -> InpM m s a u n i t i n p l 4 a = 1 i f t I n p M ( u n i t a)
b i n d I n p M : : M o n a d m 1 n p M m s a -> (a -> I n p M m s b ) -> I ~ P M s b
(a ' b i n d I n p M ' k) inp = a inp 'bind' \(va, outa) -> k va outa
liftInpM :: Monad m => m a -> InpM m s a liftInpM x inp = x .bind' \xi
unit (xi, inp)
zeroInpM :: MonadPlusZero m => InpM m s a zeroInpM = \inp - > zero
plus InpM : : MonadPlusZero m => InpM m s a -:, InpM m s a -> InpM m s a
(X *plusInpM~ y) = \inp -> x inp 'plus' y inp
instance Monad m => Monad (InpM rn s) where unit = unitfnpM bind = bindInpM
instance MonadPlusZero m => MonadPlwZero (InpM m s) where
zero = zeroInpM plus = plwInpM
- - Recognizers that are applied to a position and the - - whole list of tokens
termInt :: MonadPlusZero rn => Char -> Int -> StRM rn String Int
ternilnt c x s 1 (xc i ) 1 ( (x>length S) 1 1 (s! ! (x-1) /= C ) = zero 1 otherwise = unit (x+l)
parseInt :: (Int -> StRM m String Int) - > Int -> String -> m Int
parseInt x iap s = x inp s
instance MonadPlusZero (StRM m String) => Recognizer (StRM m string) Int where
termR = termint
memoizeRec ::
Recagnizer (St- (ListM ( S t (State [ In t ] ) ) ) String) Int => String
-> ( Int -> StRM (ListM (St (State [Int] ) ) String Int) -> (Int -> StRM (ListM (St ( S t a t e [Int]))) String Int)
rr.emoizeRec name f i s = lookupst i narne 'bindSt' \xl - >
i f xi /= [] then unitSt (xl! ! O 1 else f i s 'bindSt' \x2 - >
updatest i name x2 'bindSt' \ O - > unitSt x2
parsest : :
(Int - > StRM (ListM (St (State [ I n t ] ) ) ) Stririg Int) -> Int -> String -> ( [Intl , State [Intl
parsest x inp s = x inp s (newSt (length s))
instance Recdgnizer (StRM (L i s tM ( S t (State [Int] 1 ) 1 String) Int
=> MRecognizer ( S t R M (ListM ( S t (State [ïnt] ) ) ) s t r ing) Int
where memoizeR = memoizeRec
- - non-deterministic recognizer rl : : Recognizer (StRM [] String) In t
=> Int -> StRM [] String Int rl = (te- ' a 1 ' t h e ' (rl 'thenR' rl)
'orelseR' emptyR
- - determirristic recognizer r 2 : : Recognizer (StRM Ex s t r ing) In t
=> Int -> StRM Ex String Xnt r 2 = (te- 'a8 'th-' ( r2 'th-' r2) )
'orelseR' emptyR
-- rnemoized recognizer r3 : : Recognizer
(StRM (ListM ( S t (State [ Int ] ) 1 ) String) In t => In t -> StRM (L i s tM ( S t (State [ I n t l ) ) ) String Int
r3 = rnemoizeR "r3" ( ( t e e ' a 1 ' t h e ' ( r3 'thenR' r3)) 'orelseR' emptyR)
-- ? parsexnt rl 1 "aaaaan - - Ci, 2 , 3 , 4 , 5 , 61 -- (10448 reductions, 16026 cel is ) - - ? parseInt r 2 1 "aaaaan -- Ok 6 -- (569 reductions, 813 cells)
- - Recognizers that are applied to a list of tokens and -- return a list of tokens yet to be recognized
termChar :: MonadPlusZero m => Char -> String -> m String
termChar c inp = case inp of 1 1 -> zero (X:XS) -> if x == c
then unit xs else zero
parseChar :: (String -> m String) - > String - > rn String parseChar x s = x s
instance MonadPlusZero m => Recognizer m String where t e e = termChar
-- EXAMPLES -- non-deterministic recognizer r4 :: Recognizer [] String => String -> [String] r4 = (te- la1 'thenR' (r4 'thenRg r4)
'orel$&' enrptyR
-- deterministic recogaizer r5 :: Recognizer Ex String => String -> Ex string r5 = (te- ' a t 'the' (rS 'thenFt' r5)
'orelseIl' emptyR
-- ? parseChar r4 "aaaaan -- [ I l "aanf "aaan, "aaaan, "aaaaan] -- (4586 reductions, 7672 cells) -- ? parseChar r5 "aaaaan -- Ok [3 -- (143 reductions, 186 cells)
Appmdix B: Implancnwion of Mo& Language Pmcasorr Using Type Gpnnicror Ciasses
1 otherwise eval 'a' = 1
= unit (eval c, x+l)
parseEInt : : StRM (InpM m Tnt) String Int -> Int -> m (Int, Int)
parseEInt x s inp = x s inp
instance MonadPlusZero (StRM (InpM m Int) => Parser (StRM (InpM m Int) String)
term = termEInt
memoizeEv ::
Parser
- z String
String) fnt where
(StRM ( InpM (ListM (St (State [ (Int, Int 1 1 1 ) Int 1 => String ->StRM (InpM (ListM (St (State [(Int,Int)] 1 ) ) Int ->StRM (InpM (ListM (St (State ( (Int, Intf ] ) ) ) Int memoizehr name f s i
= lookupst i name 'bindçt' \xl -> if xi /= 11
then unitSt (xl! ! O ) else f s i 'bindSt' \x2 ->
String) Int
1 String Int 1 String Int
parsest- : : StRM (InpM (ListM (St (State [ (~nt, Int) 1 ) 1 Int) String Int
-> String -> Int -> ([(Int,fnt)], State [(Int,Int)l) parseStEv x s inp = x s inp (newSt (length s)
instance Parser (StRM (InpM (ListM (St (State [ (Int, Int) 1 ) ) ) Intl String) Int
=> MParser (StRM (InpM (ListM (St (State [(Int,Int)l))) Int) String) Int
where memoize = memoizeEv
- - EXAMPLES - - non-deterministic evaluator el : : Parser (StRM (InpM II Int) String) Int
=> StRM (InpM [1 Int) String Int el = (term 'a1 'bind' \xl ->
el 'bind' \x2 - > el 'bind' \x3 -> unit ( x l + x2 + x3) ) 'orelse' f=mptY 0
-- deterministic evaluator e2 : : Parser (StRM (InpM Ex fnt) String) Int
=> StRM (XnpM Ex Int) String Int e2 = (tem l a t 'bind' \xi ->
e2 'bind' \x2 -> e2 'bind' \x3 ->
unit (xl + x2 + x3)) 'orelse' ewty 0
-- memoized evaluator e3 : : MParser (StRM (InpM ( L i s t M (St (State [ ( In t , Int) ] 1 ) Int) String) I n t => StRM (InpM (ListM (St (State [ (Int, Int ) 1 ) 1 1 I n t ) e3 = memoize "e3" ((term la ' 'bind- \xi ->
e3 'bind' \x2 - > e3 'bind' \x3 -> unit (xl + x2 + x3) 1 'orelse ' empty O )
String Int
VZTA AUCTORIS
Barbara Szydlowski was bom on Jmuary 2, 1964, in Lublin, Poland.
After receiving her high school diploma from the Heunan Jan Zamoyski
Gymnasium in 1982, she began her study at the Marie Curie-Sklodowska
University in Lublin. She graduated with Mister's Degree in Mathematia in
1987. Currendy she is a candidate for Mimer's Degree in Science at the
University of Windsor.
r i v i n v L L V A L V A I I W I Y
TEST TARGET (QA-3)
APPLIED I W G E . lnc - = 1653 East Main Street - -. - - Rochester. NY 14609 USA -- -- - - Phone: i l 6/48SQXMl -- -- - - FU: 71W288-5989
0 1993. Appiied Image. tnc.. All Rights Resenred