LANGUAGE PROCESSING BY DYNAMICAL...

23
International Journal of Bifurcation and Chaos, Vol. 14, No. 2 (2004) 599–621 c World Scientific Publishing Company LANGUAGE PROCESSING BY DYNAMICAL SYSTEMS PETER BEIM GRABEN * Institute of Linguistics, Center for the Dynamics of Complex Systems, University of Potsdam, Germany [email protected] BRYAN JURISH Berlin-Brandenburg Academy of Sciences, Berlin, Germany DOUGLAS SADDY and STEFAN FRISCH Institute of Linguistics, University of Dynamics of Complex Systems, University of Potsdam, Germany Received January 5, 2002; Revised September 30, 2002 We describe a part of the stimulus sentences of a German language processing ERP experiment using a context-free grammar and represent different processing preferences by its unambiguous partitions. The processing is modeled by deterministic pushdown automata. Using a theorem proven by Moore, we map these automata onto discrete time dynamical systems acting at the unit square, where the processing preferences are represented by a control parameter. The actual states of the automata are rectangles lying in the unit square that can be interpreted as cylinder sets in the context of symbolic dynamics theory. We show that applying a wrong processing preference to a certain input string leads to an unwanted invariant set in the parsers dynamics. Then, syntactic reanalysis and repair can be modeled by a switching of the control parameter — in analogy to phase transitions observed in brain dynamics. We argue that ERP components are indicators of these bifurcations and propose an ERP-like measure of the parsing model. Keywords : Language processing; local ambiguity; ambiguity resolution; pushdown automata; odel codes; symbolic dynamics; cylinder sets; entropy. 1. Introduction Phase transitions in the human brain are well known in the processes of cognitive motor con- trol [Kelso et al., 1992; Engbert et al., 1997] and they have been successfully modeled by nonlin- ear field theories of cortical activation [Jirsa, 2004; Wright et al., 2004]. In human language processing phase transitions were observed by Ra¸ czasek et al. [1999] using continuously varying prosodic param- eters for the disambiguation of speech and they were modeled by Kawamoto [1993] for lexical am- biguity resolution using a Hopfield net. Due to Ba¸ sar [1980], beim Graben et al. [2000b] and Frisch et al. [2004] have considered event-related brain potentials (ERPs) and event-related cylinder en- tropies as order parameters of the brain indi- cating disorder–order phase transitions when the * Address for correspondence: Institute of Linguistics, University of Potsdam, P.O. Box 601553, 14415 Potsdam, Germany. 599

Transcript of LANGUAGE PROCESSING BY DYNAMICAL...

Page 1: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

International Journal of Bifurcation and Chaos, Vol. 14, No. 2 (2004) 599–621c© World Scientific Publishing Company

LANGUAGE PROCESSING BY

DYNAMICAL SYSTEMS

PETER BEIM GRABEN∗

Institute of Linguistics,†Center for the Dynamics of Complex Systems,

University of Potsdam, [email protected]

BRYAN JURISHBerlin-Brandenburg Academy of Sciences, Berlin, Germany

DOUGLAS SADDY† and STEFAN FRISCHInstitute of Linguistics,

University of Dynamics of Complex Systems,University of Potsdam, Germany

Received January 5, 2002; Revised September 30, 2002

We describe a part of the stimulus sentences of a German language processing ERP experimentusing a context-free grammar and represent different processing preferences by its unambiguouspartitions. The processing is modeled by deterministic pushdown automata. Using a theoremproven by Moore, we map these automata onto discrete time dynamical systems acting at theunit square, where the processing preferences are represented by a control parameter. The actualstates of the automata are rectangles lying in the unit square that can be interpreted as cylindersets in the context of symbolic dynamics theory. We show that applying a wrong processingpreference to a certain input string leads to an unwanted invariant set in the parsers dynamics.Then, syntactic reanalysis and repair can be modeled by a switching of the control parameter— in analogy to phase transitions observed in brain dynamics. We argue that ERP componentsare indicators of these bifurcations and propose an ERP-like measure of the parsing model.

Keywords : Language processing; local ambiguity; ambiguity resolution; pushdown automata;Godel codes; symbolic dynamics; cylinder sets; entropy.

1. Introduction

Phase transitions in the human brain are well

known in the processes of cognitive motor con-

trol [Kelso et al., 1992; Engbert et al., 1997] and

they have been successfully modeled by nonlin-

ear field theories of cortical activation [Jirsa, 2004;

Wright et al., 2004]. In human language processing

phase transitions were observed by Raczasek et al.

[1999] using continuously varying prosodic param-eters for the disambiguation of speech and theywere modeled by Kawamoto [1993] for lexical am-biguity resolution using a Hopfield net. Due toBasar [1980], beim Graben et al. [2000b] and Frischet al. [2004] have considered event-related brainpotentials (ERPs) and event-related cylinder en-tropies as order parameters of the brain indi-cating disorder–order phase transitions when the

∗Address for correspondence: Institute of Linguistics, University of Potsdam, P.O. Box 601553, 14415 Potsdam, Germany.

599

Page 2: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

600 P. b. Graben et al.

control parameters, i.e. the experimental condi-tions, assume critical values. In language process-ing ERP experiments the control parameters of thebrain are given by the manipulations of the sentencematerial, e.g. preferred versus dispreferred continu-ations of local syntactic ambiguities. In German,as in many other languages, native speakers havebeen shown to prefer the grammatical function ofthe subject for an ambiguous nominal constituentrather than the object (see [Frisch et al., 2004]).If the preferred analysis according to this so-calledsubject preference strategy is disconfirmed by follow-ing sentence material, processing problems arise andrevision (a so-called reanalysis) must be initiated(see [Fodor & Ferreira, 1999] for an overview).

Since the manipulations in language process-ing ERP experiments are symbolic, they were notwell suited for quantitative dynamical modeling.Therefore most efforts in modeling language pro-cessing have been undertaken in developing sym-bol manipulating automata [Aho & Ullman, 1972;Hopcroft & Ullmann, 1979]. According to this ap-proach one considers formal languages which canbe processed by finite control machines. A for-mal language is a subset of all strings obtained bythe concatenation of “letters”, the so-called termi-nals from a finite alphabet. Such a subset mightbe generated by a grammar, i.e. a finite produc-tion system of “rules” describing the substitutionof strings of letters and auxiliary symbols, callednonterminals by strings of terminals and/or non-terminals. A context-free grammar is a productionsystem where each rule must describe the substitu-tion of one single nonterminal into a string of ter-minals and/or nonterminals. A context-free gram-mar is called locally ambiguous if there are two ormore rules expanding the same nonterminal sym-bol. One refers to the history of rules applied toobtain a certain string of terminals as a deriva-tion. A left derivation is a derivation where alwaysthe leftmost nonterminal symbol is expanded. Acontext-free grammar is called ambiguous if thereare more than one left derivations yielding the samesequence of terminal symbols. Languages generatedby unambiguous context-free grammars can be pro-cessed by deterministic pushdown automata. Theseare devices with a finite control and with a one-sided infinite memory tape, the so-called stack. Asimple subclass of pushdown automata are deter-ministic top-down recognizers with only one internalstate. Languages with local ambiguities can also be

processed with deterministic pushdown automata,however they need a look-ahead into the terminalstring to be processed. Without this look-ahead theprocessing of locally ambiguous languages may leadto garden path effects when the automaton makeswrong predictions about the structure of a string.

In the last decades, physicists continued tobe interested in the physics of symbolic compu-tation. In his compelling review, Bennett [1982]describes “ballistic” and “Brownian” computersthat are computationally equivalent to Turing ma-chines. These dynamical systems generate trajec-tories “isomorphic with the desired computation”from initial conditions regarded as inputs (seealso [Crutchfield, 1994]). The formal equivalence ofTuring machines and nonlinear dynamical systemshas been proven by Moore [1990, 1991b]. He demon-strated that any symbolic system can be mappedonto a piecewise affine linear map defined at theunit square by interpreting the symbols as integersfrom some b-adic number system.

In this paper, we use Moore’s construction tomap top-down recognizers (parsers) that recognizecontext-free languages onto such dynamical sys-tems. To capture the dynamics of nondeterministicparsers we decompose a locally ambiguous gram-mar into a set of unambiguous grammars that canbe processed by deterministic top-down recogniz-ers. After representing these parsers by nonlineardynamical systems, we mapped their numbers ontothe values of a control parameter, thus obtain-ing one bifurcating dynamical system at the unitsquare. This control parameter reflects the parsingstrategy. Then, psycholinguistic phenomena, suchas garden path effects in the processing of ambigu-ous structures [Frisch et al., 2004] can be roughlymodeled as a kind of phase transition in the dy-namical system elicited by a switching of the controlparameter.

We pursue three goals by this study. Firstly, weargue that cognitive computation is essentially tran-sient dynamics in the sense of Bennett [1982] andCrutchfield [1994]. That is, a problem is translatedinto an initial condition of a system from which thedynamics evolves along a transient trajectory that isisomorphic to a running computer program. Finally,an attractor or some given state is reached which isconsidered as the solution of the problem. Secondly,we suggest an approach that bridges the cleft be-tween dynamical models on one hand and symboliccomputation on the other hand [van Gelder, 1998;

Page 3: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 601

Marcus, 2001], namely by arguing that symbolicdynamics provides an interface between dynami-cal system theory and the theory of formal lan-guages and automata. And thirdly, we make useof these findings in order to demonstrate that cog-nitive (symbolic) computation is essentially nonlin-ear dynamics as follows from Moore’s proof [Moore,1990, 1991b].

The organization of the paper is as follows.In Sec. 2 we discuss again the language process-ing ERP experiment of Frisch et al. [2004] andwe present a toy-model of parsing by dynamicalsystems. Subsequently, we provide a formal the-ory of dynamical language processing in Sec. 3. In

Sec. 4 we shall address some problems and openquestions by offering possible generalizations of ourapproach.

2. A Toy-Model of Language

Processing

In order to illustrate our modeling approach, weconsider a part of the sentence material used byFrisch et al. [2004] in their ERP experiment on theprocessing of ambiguous pronouns. In this study,the following two conditions — among others —were tested:

(1) Nachdem die Kommissarin den Detektiv getroffen hatte, sah sies

after the cop the detective met had saw shes

[den Schmuggler]o.[the smuggler]o.“After the cop had met the detective, she saw the smuggler.”

(2) Nachdem die Kommissarin den Detektiv getroffen hatte, sah sieo

after the cop the detective met had saw sheo

[der Schmuggler]s.[the smuggler]s.“After the cop had met the detective, the smuggler saw her.”

Here, we have tagged the nominal constituents ofthe main clause (the second part after the comma)of the sentences (1) and (2) by their grammati-cal roles, where s denotes “subject” and o denotes“object”. The sentences (1) and (2) of the ERPexperiment are therefore mapped onto the stringsso and os, respectively constituting the formal lan-guage L = {so, os} from the alphabet {s, o}.

2.1. Context-free grammars

Next we ask for a grammar describing the languageL. Since we are only interested in describing themain clauses of the sentences (1) and (2), we pro-pose the following context-free grammar G.

S → s o (3)

S → o s (4)

where S means “sentence” (the start-symbol of thegrammar). The rule (3) assigns subject to the pro-noun (sie: “she”) and object to the second nounphrase of the clause (den Schmuggler: “the smug-gler”), while rule (4) does the opposite: the pro-noun becomes object and the second noun phrase

becomes subject (der Schmuggler). By applying therules (3), (4) one obtains the context-free languageL = L(G) = {so, os}. Our toy-grammar G = {(3),(4)} is locally ambiguous. That means that thereis more than one possibility to expand the nonter-minal S of the grammar into a string of terminalsconsisting of s or o (for a formal treatment of localambiguity see Sec. 3.3).

Let us look for a coarse-grained model of theprocessing strategies. Psycholinguists have foundthat a subject interpretation of a constituentwhich is ambiguous between subject and ob-ject is preferred and that a later disambigua-tion towards an object interpretation is more dif-ficult to process [Schlesewsky et al., 2000; Frischet al., 2002]. Due to this subject preference strat-egy the first encountered noun phrase is inter-preted to be the subject of a clause [Frisch et al.,2004]. This strategy corresponds to the rule (3)of the toy-grammar because (3) creates the se-quence so for the matrix clause. Contrarily, therule (4) assigns the object role to the pronounwhich is the dispreferred interpretation. We canthus split the locally ambiguous grammar (3),

Page 4: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

602 P. b. Graben et al.

(4) into two unambiguous ones: the subgrammarG1 consisting only of the rule (3) models the sub-ject preference strategy while the subgrammar G2

which is formed by the single rule (4) produces thestring os hence modeling example (2). The sub-grammars G1 and G2 generate the formal languagesL1 = L(G1) = {so} and L2 = L(G2) = {os}, re-spectively. We show in Sec. 3.3 that any locally am-biguous context-free grammar can be partitionedinto a finite set of locally unambiguous context-freesubgrammars.

2.2. Top-down parser

So far we have modeled the sentence material andthe processing preferences of the ERP experimentthat has been described by Frisch et al. [2004]. Nextwe shall construct a processing model. It is wellknown from computer science that context-free lan-guages are recognized by pushdown automata [Aho& Ullman, 1972; Hopcroft & Ullmann, 1979]. Thelanguages produced by our toy-grammars G1 andG2 (rules (3) and (4)) can be processed by a sub-class of pushdown automata, the so-called top-downparsers, which are the most simple of these devices(cf. Sec. 3.4). A top-down parser consists of twotapes: the input tape, containing the string beingprocessed, and a stack memory for storing tem-porary information. Both tapes can be describedby strings γ, w where γ = γi0 , . . . , γik−1

denotesthe concatenation of k symbols on the stack andw = wj1 , . . . , wjl

denotes a sequence consisting ofl symbols in the input. While the input may onlycontain terminal symbols, the stack is able to storeterminal and nonterminal symbols as well. The dou-blet (γ, w) of stack and input is called the actualpair of the automaton.1 The top-down parser onlyhas access to the first symbol Z = γi0 on the stackand to the first item a = wj1 at the input. Depend-ing on these “visible” symbols the parser determinesits actions at input and stack.

Let us consider the language L1 = L(G1) ={so} for an example. A top-down parser τ1 thatrecognizes L1 can be described by a table of actualpairs [Allen, 1987] that is shown in Table 1. At thefirst step the parser is initialized with the start sym-bol Z = S on the stack and the whole string w = so

in the input. The device “sees” the start symbol atthe top of the stack (indicated by the bold font)and first recognizes that it is a nonterminal of thegrammar G1. Then it looks into the list of produc-tion rules to find a rule with the start symbol onthe left-hand side in order to predict the right-handside. Since G1 is unambiguous, the parser uniquelyfinds the rule (3) for expanding S into so; that isthe operation given in the fourth column of Table 1.

After the first step the actual pair is (so, so),where the visible entries are printed in bold. Nowthe parser recognizes that the leading symbol onthe stack is the terminal s. In this case the parsertries to attach the predicted symbol with the currentinput. This is possible if both symbols agree. By at-tachment the parser cancels the coincident symbolsof stack and input tape and it shifts both tapesone step to the left thus yielding state two. Now,the leading symbols of stack and input tape areagain corresponding terminals, so that the parserperforms a further attachment. Finally, the automa-ton reaches its accepting state where both tapes,stack and input are empty (or, technically speak-ing, they contain the empty word ε).

As for the language L1 we do the constructionof the recognizing top-down parser for the other

Table 1. Top-down parser τ1 processing soaccording to grammar G1. The leading sym-bols at stack and input tape are printed inbold fonts.

Step Stack Input Operation

0. S so predict by rule (3)1. so so attach2. o o attach3. ε ε accept

Table 2. Top-down parser τ1 processing os ∈ L2

according to grammar G1. The leading symbols atstack and input tape are printed in bold fonts.

Step Stack Input Operation

0. S os predict by rule (3)1. so os attachment fails

2. so os nonaccepting final state

1The term “state” has quite different meanings in computational linguistics and in dynamical system theory, respectively.However, as will become clear subsequently, the actual pairs of a top-down parser correspond to the (macro-)states of theassigned dynamical system. We shall therefore refer to actual pairs as “states” in the dynamical sense, unless stated otherwise(as in Sec. 3.3).

Page 5: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 603

language L2 in the same manner thus obtainingan automaton τ2. What happens if one employs awrong processing strategy? We shall discuss this is-sue on the example of processing the string os ∈ L2

by the parser τ1. Table 2 shows this process.The parsing process fails and ends up in a

nonaccepting state of the automaton. This can beregarded as a very crude model of a garden pathinterpretation in psycholinguistics [Fodor & Fer-reira, 1999]. In automata theory such miss-parsesmust be corrected by backtracking [Aho & Ullman,1972] that may serve as a model of psycholinguisticreanalysis [Fodor & Ferreira, 1999].

2.3. Godel encoding

The parsing processes described in the last sub-section are obviously reminiscent to dynamics. TheTables 1 and 2 provide dynamic computation in thesense of Bennett [1982] and Crutchfield [1994]: theproblem to be solved is formulated by an initialcondition, namely an actual pair of the parser; thecomputational process is a sequence of such states;i.e. a transient trajectory and the result of the com-putation corresponds to an attractor, either wantedor unwanted. This raises the question whether it ispossible to translate parsing into nonlinear dynam-ics. Indeed, it is. As mentioned above, the actualpairs (γ,w) of the top-down parser can be consid-ered as states belonging to some phase space. Weshall discuss in this subsection how this symbolicspace is mapped on a vector space (for a formalproof, see Sec. 3.4).

The idea of the construction is to map the com-mon terminals s, o and the nonterminal S of the twolanguages L1 and L2 onto integer numbers. Thismapping g is called a Godel encoding [Godel, 1931;Hofstadter, 1979]. The way in which this is achievedis completely arbitrary. We shall choose the follow-ing encoding

g(o) = 0

g(s) = 1

g(S) = 2 .

(5)

From these Godel codes of the symbols we com-pute the values of symbol strings simply by a b-adic number expansion of proper fractions, wherewe use the number of terminals and nonterminalstogether, b1 = 3, for representing the stack andthe number of terminal symbols, b2 = 2, for rep-resenting the input tape. Let us consider the first

two rows of Table 1 as an example. Row one con-tains the actual pair (S, so). The start symbol S(Godel code = 2) is interpreted as the fraction0.23 in the 3-adic number system. Hence, we ob-tain the decimal fraction 0.23 = 2×3−1 = 0.666710.The decimal representation of the input tape is0.102 = 1× 2−1 + 0× 2−2 = 0.510. Thus, the actualpair (S, so) of the parser is mapped onto the dou-blet (0.6667, 0.5) which is a point in the unit square[0, 1]× [0, 1]. The same calculation maps the actualpair (so, so) of the second row of Table 1 onto thepoint (0.3333, 0.5). We shall refer to these b-adicfractions also as Godel codes and we shall use thesymbols g1 and g2 for the extension of the map (5)to the contents of the stack and the input tape,respectively.

Now we have to address one of the most essen-tial issues of our construction. For theoretical rea-sons (which will be thoroughly treated in the nextsection) we have to interpret the actual pairs of theparser not as points in the unit square but as rectan-gles lying in this phase space. We therefore considerthe tapes of the parser not as being empty followingthe stored contents γ and w, e.g. so ε at the inputtape. Instead, we allow for any arbitrary continua-tion that is considered to be unknown, i.e.

[so] = {so ooo . . . , so oos . . . , so oso . . . , . . . ,

so sss . . .} (6)

We will see in the next section that these sets ofstrings all having a common building block arenothing else than those cylinder sets, which havebeen used by beim Graben et al. [2000b] and byFrisch et al. [2004] to analyze event-related brainpotentials. Let us consider the infimum and thesupremum of these sets represented by their Godelfractions. The Godel number of the infimum is givenby g2(so) + g2(o) = g2(so). But the Godel num-ber of the supremum is g2(so) + 2−2 × g2(sss . . .).From an expansion into a geometrical series we rec-ognize that the value of the infinite binary fractiong2(sss . . .) = 0.111 . . .2 is one. Hence the supremumis obtained as g2(so) + 2−2. We generalize these re-sults to the formulas

inf(g1;2([γ])) = g1;2(γ)

sup(g1;2([γ])) = g1;2(γ) + b−|γ|1;2

(7)

for any string γ of length |γ| consisting of b1;2 sym-bols that constitutes the cylinder set [γ] (for a proofsee Sec. 3.2).

Using this representation of strings by cylindersets we eventually map the actual pairs (γ, w) of

Page 6: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

604 P. b. Graben et al.

the parsers onto Cartesian products of intervals

(γ, w) 7→ [g1(γ), g1(γ) + b−|γ|1 ]

× [g2(w), g2(w) + b−|w|2 ] (8)

which are rectangles in the unit square. For the ac-tual pairs of the first two rows of Table 1 we there-fore get the rectangles [0.6667, 1.0000]× [0.5000,0.7500] and [0.3333, 0.4444] × [0.5000, 0.7500].

2.4. Parsing dynamics

Using the Godel encoding of actual pairs we areable to map the complete parsing process onto atrajectory of rectangles through the unit square.Table 3 shows the trajectory for the parse presentedin Table 1.

Table 3. Trajectory of the parse ofso ∈ L1 processed by the top-down parserτ1 according to grammar G1.

Step Rectangle

0 [0.6667, 1.0000] × [0.5000, 0.7500]1 [0.3333, 0.4444] × [0.5000, 0.7500]2 [0.0000, 0.3333] × [0.0000, 0.5000]3 [0.0000, 1.0000] × [0.0000, 1.0000]

0 1 2

0

1

stack

inp

ut

01

2

3

0 1 2

0

1

stack

inp

ut

01

2

3

0 1 2

0

1

stack

inp

ut

01

2

3

0 1 2

0

1

stack

inp

ut

01

2

3

Fig. 1. Graphical representation of the trajectory of theparse of so ∈ L1 processed by the top-down parser τ1

according to grammar G1. The numbers of the rectanglescorrespond to the steps in Table 3.

This trajectory is graphically displayed inFig. 1, which suggests that there is a determinis-tic flow defined at the unit square which drives arectangular set of points through the state space bystretching or squashing it. This flow actually existsand it can be constructed from the possible oper-ations of the top-down parser. As we have seen inSec. 2.2 the parser has only access to the topmostsymbols of the stack and of the input tape. There-fore, only the actual pairs (Z, a) completely deter-mine the next operation of the parser. Since Z couldbe a terminal or a nonterminal and since a must bea terminal, these symbols provide a partition of theunit square into a grid of basic rectangles by theirGodel numbers, which are displayed in Fig. 1 by thedotted lines. At each of these dotted rectangles theparser operates in one of the following three ways:

(9) Predict; if Z is a nonterminal symbol, ex-pand it into a string of terminals and/or non-terminals according to a unique rule of thegrammar; the content of the input remainsunchanged. By expanding Z into a stringγi1, . . . , γik−1 it is mapped onto the sym-bol γi1 . This entails a shift of the rectanglealong the x-axis in the Godel-representation.Additionally, the content of the stack becomeslonger, yielding a compression of the rectanglealong the x-axis. From Fig. 1, we see that thishappens by mapping the rectangle 0 onto therectangle 1.

(10) Attach; if Z is a terminal symbol, compareit with a in the input tape. If they agree,delete both from stack and input. That means,the actual pair (a, a) is mapped onto the ac-tual pair (ε, ε). According to the Godel en-coding the rectangle corresponding to (a, a)is extended and shifted along the x- and they-axes such that it coincides with the unitsquare. This operation was performed by map-ping rectangle 2 onto rectangle 3 in Fig. 1. If,on the other hand, Z does not agree with athe parser arrives at a nonaccepting state.

(11) Accept; the parse successfully terminates whenthe whole unit square [0, 1] × [0, 1] is coveredby the last rectangle. Otherwise any rectanglecorresponding to an actual pair (Z, a) wherenone of these operations applies, is an invari-ant set that is mapped onto itself.

The striking result of our construction due toMoore’s proof is that the deterministic dynamics of

Page 7: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 605

a top-down parser is a piecewise affine linear mapdefined at those cells of the partition of the unitsquare, which correspond to the actual pairs (Z, a)of one symbol in the stack and one symbol in the in-put tape, respectively. Formally, we recognize thata parser τp is given by the map Φp acting as[

xt+1

yt+1

]

= Φp

([

xt

yt

])

=

[

ap,(i,j)x

ap,(i,j)y

]

+

[

λp,(i,j)x 0

0 λp,(i,j)y

]

[

xt

yt

]

,

(12)

where [xt, yt]T is a point in the unit square,

[xt+1, yt+1]T is its iterate, and (i, j) denote the do-

mains of definition that are given by the Godelnumbers i = g(Z), j = g(a) of the topmost sym-bols of stack and input tape providing the partition

of the unit square. The coefficients ap,(i,j)x and a

p,(i,j)y

of the parser τp describe a parallel translation of a

state whereas the matrix elements λp,(i,j)x and λ

p,(i,j)y

mediate the stretching (λ > 1) or squeezing (λ < 1)of a rectangle. We present the values of these co-efficients for the parser τ1 accepting the languageL1 = {so} in Table 4.

In Fig. 2 we display the domains of definitionand the coefficients of the map (12) for the parserτ1. Figure 2 encodes the possible operations of thetop-down parser by colors, showing the domains ofdefinition of the piecewise affine linear functions in(a) and their images in (b). Red rectangles whichare labeled by their corresponding actual pairs inthe Godel numbering represent the predict opera-tions, where the leading symbol of the stack is ex-panded into a string of symbols, thus entailing atranslation of the rectangle in connection with itscontraction along the x-axis. In (b) the red rect-angles having the same labels are those images.

Table 4. Coefficients of the map (12) for the top-downparser τ1 joined to grammar G1.

Actual Pair ax ay λx λy

(0, 0) 0.0000 0.0000 3.0000 2.0000(0, 1) 0.0000 0.0000 1.0000 1.0000(1, 0) 0.0000 0.0000 1.0000 1.0000(1, 1) −1.0000 −1.0000 3.0000 2.0000(2, 0) 0.1111 0.0000 0.3333 1.0000(2, 1) 0.1111 0.0000 0.3333 1.0000

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

(a)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

(b)

Fig. 2. (a) Domains of definition of the parsing map (12) forthe top-down parser τ1. The colors of the rectangles encodethe operation of the parser at the corresponding actual pairs.Red: predict; the leading symbol of the stack is expanded intoa string of symbols. Blue: attach; the leading symbol of thestack is a terminal agreeing with the leading symbol at theinput tape. The blue rectangles are extended to the wholeunit square. Yellow: do not accept; in the yellow domains themap is the identity. (b) Images of the red rectangles from(a). The red rectangles are the images of those red rectanglesfrom (a) labeled by the same actual pair (i, j).

Page 8: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

606 P. b. Graben et al.

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

(a)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

0 1 2

0

1

stack

input

(2,0)

(2,1)

(b)

Fig. 3. (a) Domains of definition of the parsing map (12)for the top-down parser τ2. (b) Images of the red rectanglesfrom (a). For the color scheme, see Fig. 2.

Blue rectangles in (a) represent the attach opera-tion. Their images are the whole unit square. Yellowrectangles in (a) represent actual pairs where nei-ther predict nor attach nor accept can be applied.

So far we have discussed the dynamics of theparser τ1 processing the string so according to

the subject preference strategy. Additionally, wepresent also the map of the parser τ2 that processesthe string os with respect to the grammar G2 inFig. 3. Obviously, as Figs. 2 and 3 reveal, the do-mains of definition are the same for all parsersmaps. Only the images of the red rectanglesdiffer due to the different grammars which areused to expand the nonterminal symbols on thestack.

After performing this construction of a discretetime deterministic dynamical system for the twotop-down parsers τ1 and τ2, any string of the lan-guage L = {so, os} can be processed by each ofthese dynamical systems. To achieve this, the stringmust be translated into an interval at the y-axisby the b2-adic Godel encoding g2. Moreover, onehas to compute the image of the start symbol Sthat initializes the stack of the parsers under theGodel encoding g1. Thus we obtain a rectangle R0

in the unit square as the image of the first actualpair. To carry out a simulation, we prepare an en-semble of points x

i0 = [xi

0, yi0]

T , i = 1, 2, . . . , M ,randomly chosen in R0 as initial conditions of thedynamics. From each of these points the first iter-ate x

i1 = Φp(x

i0) of the map (12) is computed. The

envelope of this set yields the rectangle R1 in theunit square corresponding to the second actual pairof the parse. By performing the iteration procedurerecursively: x

it = Φp(x

it−1), we obtain an ensemble

of M trajectories {xit|t ∈ N0, i = 1, 2, . . . , M} ex-

ploring the phase space of the parsing dynamics.The envelopes of the sets of all states at the tra-jectories at each time t provide a sequence of rect-angles Rt that are equivalent to the actual pairs ofthe parser τp due to the Godel encoding. In thissense we have constructed a dynamical toy-modelof language processing.

2.5. Diagnosis and repair

In Sec. 2.2 we have seen that processing the string osaccording to the “wrong” grammar G1 leads to thenonaccepting final state (so, os). This actual pairis situated within the yellow rectangle (1, 0) in theunit square (see Fig. 3) where all parsing maps areevidently given by the identity. The rectangle corre-sponding to this actual pair is therefore an invariantset of the map (12). Hence, we recognize that boththe top-down parser τ1 and its associated map Φ1

lead to a garden path interpretation by processingos. In automata theory, such a dead-end can only

Page 9: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 607

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

(a)

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

0 1 2

0

1

stack

input

(1,0)

(b)

Fig. 4. (a) Domains of definition of the repair map for thecontrol parameter r = 1/2. (b) Images of the red rectanglesfrom (a). For the color scheme, see Fig. 2.

be left by backtracking [Aho & Ullman, 1972]. Weshall demonstrate in this section that our dynami-cal system model of language processing allows toestablish a dynamics managing reanalysis withoutany backtracking.

Up to now, we have considered the two mapsΦ1 and Φ2 constructed from the parsers τ1 and τ2

as being distinct dynamical systems. However, fromthe physical point of view it is much more natu-ral to consider them as only one dynamical systemdepending on an additional control parameter, be-cause all these maps are living at the same statespace that is partitioned in the same way. For thesake of convenience we will normalize the possiblevalues of the control parameter r to the unit interval[0, 1]. We take the predecessor of the parser’s num-ber p as the control parameter, r = p−1. Thus, themaps Φp become parameterized as Φr.

What happens when the wrong parsing strat-egy r = 0 (p = 1) is used for processing the stringos that must be correctly parsed by the strategyr = 1 (p = 2)? As we know, the dynamics reachesan unwanted invariant set represented by the ac-tual pair of Godel numbers (1, 0). Assuming thatthe map Φ0 is iteratively invoked by some “cogni-tive control unit” (regarded as a “homunculus”),this higher level system eventually recognizes thenonacceptable attractor (1, 0) in the parser’s phasespace. We may call this recognition process diagno-sis [Fodor & Ferreira, 1999]. After diagnosing thegarden path analysis the control unit will turn thecontrol parameter of the parsing dynamics towardsthe right choice r = 1. However, this will not helpat all because we know that the state (1, 0) is aninvariant set for every value of the control param-eter. The solution is to introduce an intermediarydynamical system to repair the failed parse. We as-sign this repair dynamics to the control parameterr = 1/2. Figure 4 presents the domains of definitionand the images of this map.

The map Φ1/2 is constant for almost every cellof the partition but not at (1, 0). Having this in-termediary map available, the control unit changesthe control parameter from r = 0 to r = 1/2 forone iteration. This causes the dynamics to leave theinvariant set after the next iteration. The systemdoes not trace the trajectory in the past, as wouldbe necessary for backtracking. Finally, the controlunit processes further with r = 1. Figure 5 showsthe complete parse with garden path interpretation(r = 0), diagnosis and repair (r = 0 7→ r = 1/2),and completion (r = 1).

In dynamical system theory, changing controlparameters beyond critical values causes qualita-tive changes in the structure of the flow at thestate space, i.e. bifurcations. Thus, we see thatour language processing dynamics can be modeled

Page 10: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

608 P. b. Graben et al.

in analogy to a bifurcating dynamical system whengarden path interpretations are diagnosed andrepaired.2

2.6. Parsing entropy

In the language processing experiment that moti-vates our dynamical system modeling, Frisch et al.[2004] observed a P600 ERP component elicitedby the revision of the subject preference strategy.The authors also presented an alternative analy-sis of ERP by means of symbolic dynamics whereERP components are usually reflected by decreasesof the cylinder entropy, thus indicating disorder–order phase transitions of the brain dynamics [beimGraben et al., 2000b]. In this subsection we shallpresent a way to measure the entropy of the parsingstates of our model which is justified by the equiva-lence — shown in the next section — of the actual

0 1 2

0

1

stack

inp

ut

012

3

4

0 1 2

0

1

stack

inp

ut

012

3

4

0 1 2

0

1

stack

inp

ut

012

3

4

0 1 2

0

1

stack

inp

ut

012

3

4

Fig. 5. Graphical representation of the diagnosis and repairtrajectories. Reanalysis of the parse of os ∈ L2 processed bythe dynamics Φ0 according to grammar G1: steps 0 and 1;the state 1 corresponds to the unwanted garden path. Thisis recognized by the control unit which subsequently turnsthe control parameter to r = 1/2 (diagnosis). Step 2 yieldsthe repaired state. Then the control unit changes the controlparameter to r = 1 to complete the parse according to theright grammar G2: steps 2–4.

pairs of a parser with cylinder sets of a symbolicdynamics.

Entropy is a measure of uncertainty basedon probability distributions. Shannon and Weaver[1949] defined the entropy of a discrete distribution{(i, pi)|i = 1, . . . , n} by

H = −n∑

i=1

pi log pi . (13)

In dynamical system theory the probability pi foroccupying the ith cell Bi of a partition of the phasespace can be estimated by the relative dwellingtime of typical trajectories in Bi [Ott, 1993]. Weshall use a similar argument utilizing the relativearea of a rectangle in the unit square with re-spect to a measurement partition. To be precise, wedefine

pi(R) =area (R ∩ Bi)

area (R)(14)

where the geometrical function area (R) =(x2 − x1) · (y2 − y1) determines the areaof the rectangle R = [x1, x2] × [y1, y2].

By applying Eqs. (13) and (14) to the trajecto-ries of rectangles of our parsing dynamics using anappropriately chosen measurement partition that is

0 1 2

0

1

stack

input

0 1 2

0

1

stack

input

0 1 2

0

1

stack

input

0 1 2

0

1

stack

input

Fig. 6. Measurement partition of the unit square for deter-mining parsing entropies.

2Similarly, Kawamoto [1993] has modeled the access to an alternative meaning of a lexically ambiguous word by the habituationof synaptic weights in a Hopfield neural network entailing a bifurcation of the network’s energy landscape.

Page 11: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 609

0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

iteration

entr

opy

Fig. 7. Time coarse of parsing entropies due to the subject

preference strategy from the dynamical systems according tothe measurement partition shown in Fig. 6. The colors cor-respond to the ERP components and event-related entropiespresented in [Frisch et al., 2004]. Blue denotes the string so,green os where the revision of the subject preference strategy

evokes a P600 ERP (the trajectory is shown in Fig. 5).

shown in Fig. 6, we obtain the time course of theparsing entropies presented in Fig. 7.

For a better visualization, Fig. 6 displays themesh density of the measurement partition in a log-arithmic scale. From Eqs. (13) and (14) one easilyrecognize that the probability pi = 1 when the rect-angle R representing the parsers state is completelycovered by one cell Bi of the measurement partition.This is either the case when R is too small or thegrid of the partition is too coarse, such that R dropsthrough the grid, leading to zero entropy. This al-most happens by construction of the measurementpartition at the domain (0, 1) where the gardenpath interpretation of our dynamical model is sit-uated. On the other hand, when R is completelycovered by N cells Bi of equal size, the overlap be-tween one of these cells and R becomes area (R)/N ,entailing pi = 1/N and thus maximal entropy.

As Fig. 7 reveals, the measurement partitionwas constructed in such a way that the detectionof the garden path analysis in processing the stringaccording to the inappropriate strategy leads to adecrease in parsing entropy as is the case in properlanguage processing [Frisch et al., 2004]. Thereforeour toy-model of language processing may offer apossible interface between dynamical and computa-tional modeling on the one hand and psycholinguis-

tic experiments and nonlinear data analysis on theother hand.

3. Theory of Language Processing

by Dynamical Systems

This section is devoted to a formal theory of dy-namical language processing that has already beenoutlined by beim Graben et al. [2000a]. After men-tioning some basic concepts of dynamical systemtheory and symbolic dynamics, we will quote thefundamental theorem of Moore [1990, 1991b] onmapping Turing machines onto dynamical systemsat the unit square. Then we shall apply Moore’sproof to construct dynamical systems from push-down automata. First, we recall the concepts ofdiscrete time dynamical systems in one and moredimensions.

3.1. Symbolic dynamics of

one-dimensional systems

A discrete time deterministic dynamical system is apair (X,Φr), where X is the phase space while theflow3 Φr : X → X is an invertible map assigningto a state xt at a certain time t its successor xt+1

at time t + 1, occasionally depending on a controlparameter r. A given state xt0 at a certain timet0 is called initial condition of the dynamics. Theset of states {xt|xt = Φt

r(xt0), t ∈ Z} generated bythe flow from the initial condition xt0 is called atrajectory of the dynamical system. The powers ofthe map Φt

r are recursively defined by iterating themap: Φt

r = Φr ◦ Φt−1r .

A special class of dynamical systems is providedby the unit interval [0, 1] as their phase space andby a nonlinear function Φr mapping the unit in-terval on itself. These systems are known as one-dimensional dynamical systems [Collet & Eckmann,1980]. The most simple one-dimensional dynamicsare defined by piecewise (affine) linear maps at theunit interval. In the following, we shall considerthe Bernoulli map [Schuster, 1989] as an intrigu-ing example that shares many common propertieswith our parsing models discussed in Sec. 2.4. TheBernoulli map is defined by Φ(x) = 2x mod 1,i.e. the map is obtained by the linear functiony = f0(x) = 2x at the subinterval [0, 0.5] and bythe affine linear function y = f1(x) = 2x − 1 at thesubinterval ]0.5, 1]. Though being piecewise linear,

3For discrete time systems the flow is sometimes called cascade [Anosov & Arnol’d, 1988].

Page 12: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

610 P. b. Graben et al.

the map is nonlinear at the whole phase space [0, 1]and exhibits the typical stretching and folding prop-erties of chaotic dynamical systems. The Bernoullimap does not depend on a control parameter.

A binary expansion of the states of theBernoulli map provides insight into its dynamics:for any number in the unit interval a representationby a proper binary fraction exists. Numbers smallerthan 0.5 start with their most significant digit “0”while numbers greater than 0.5 begin with the bi-nary digit “1”. Let us consider the initial conditionx0 = 0.1101 for an example. Its decimal expansionis given by

x0 =

∞∑

i=1

ai · 2−i

= 1 × 2−1 + 1 × 2−2 + 0 × 2−3 + 1 × 2−4

= 0.8125 , (15)

where ai are the binary digits vanishing for non-periodic fractions at some i. What are the succes-sors of x0 lying at a trajectory of the Bernoullisystem? In the decimal number system, we obtainx1 = Φ(x0) = 0.6250, x2 = Φ(x1) = Φ2(x0) =0.2500, x3 = Φ(x2) = Φ3(x0) = 0.500, x4 =Φ(x3) = Φ4(x0) = 0.0. However, using the binarynumber system yields the trajectory x1 = Φ(x0) =0.101, x2 = Φ(x1) = 0.01, x3 = Φ(x2) = 0.1,x4 = Φ(x3) = 0.0. One easily recognizes that theBernoulli map acts on the binary numbers as a shiftto the left discarding the 20 position.

The binary descriptions, e.g. “1101”, of thestates x together with the representation of the mapΦ given by the left shift σ is called the symbolic dy-namics of the Bernoulli map [Schuster, 1989]. Here,the binary digits “0”, “1” are regarded as symbolsfrom a finite alphabet A. States of the dynamicalsystem are mapped onto sequences s = ai1ai2ai3 . . .of symbols stemming from A. Sequences of finitelength are called words. The set of all sequencesof (one-sided) infinite length s = ai1ai2ai3 . . . as-sumes the notation AN. As mentioned above, themost significant digit ai1 of a binary fraction de-cides whether the state is taken from the lower orfrom the upper half interval: ai1 = “0” means x0 ∈[0, 0.5] whereas ai1 = “1” means x0 ∈]0.5, 1]. Hence,the most significant digit ai1 partitions the system’sphase space into disjunct subsets. The symbols “0”and “1” are assigned to the cells A0 = [0, 0.5] andA1 =]0.5, 1] respectively, where the indices of thesubsets of the partition are taken as symbols of thealphabet. In the same manner, the first two sym-

bols ai1ai2 of a sequence decide where the initialcondition x0 and its first iterate x1 = Φ(x0) are sit-uated. In the example discussed above, “11” means:x0 ∈ A1 and also x1 ∈ A1. The latter expression isequivalent to Φ(x0) ∈ A1 and can be reformulatedto x0 ∈ Φ−1(A1) where Φ−1 denotes the preimageof a set. It is always defined even if the map itself isnot invertible such as the Bernoulli map. By gener-alizing this expression, we obtain

s = ai1ai2ai3 . . . ⇒ x0 ∈∞⋂

k=0

Φ−k(Aik+1) . (16)

When we apply this construction to the Bernoullimap we find that the first two symbols describea partition of the unit interval into quarters: “00”means [0, 0.25], “01” means ]0.25, 0.5], and so on.In general, by fixing the first n binary digits of asequence s = ai1ai2ai3 . . . we refer to the interval[0.ai1 . . . ain00000 . . . , 0.ai1 . . . ain11111 . . .]. Thesesets of symbol strings beginning with a fixed pre-fix have been introduced in Eq. (6) to define cylin-der sets. A cylinder of length n is therefore a set ofall sequences coinciding in the first n letters of thealphabet.

3.2. Symbolic dynamics of

high-dimensional systems

Next, we shall introduce symbolic dynamics ofgeneral time discrete dynamical systems depend-ing on some control parameters r ∈ R

p; the phasespaces of which are subsets in some R

m. A coarse-grained description of the dynamics is gained by apartition covering the phase space of the dynamicalsystem [Beck & Schlogl, 1993; Badii & Politi, 1997].Let {Ai|i = 1, 2, . . . , I} be a family of I pairwisedisjunct subsets covering the whole phase space X,i.e.

⋃Ii=1 Ai = X; Ai ∩Aj = ∅, i 6= j. The index set

A = {1, 2, . . . , I} of the partition is interpreted asa (finite) alphabet of letters ai = i. In the case ofan invertible deterministic dynamics we are able todetermine the system’s past as well as its future byiterating the inverse flow Φ−1

r and the flow Φr, re-spectively. By deciding which cell Ai of the partitionis visited by a state xt at time t = . . . − 1, 0, 1, . . .we assign a symbol . . . , ai−1

, ai0 , ai1 , . . . at eachinstance of time. Thus, we obtain a bi-infinite se-quence of letters s = . . . ai−1

ai0ai1ai2 . . . from A.

The expression AZ refers to the set of all these bi-infinite strings of symbols. Given a time discreteand invertible deterministic dynamics Φr we con-struct a map π : X → AZ that assigns initial

Page 13: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 611

conditions x0 ∈ X to bi-infinite symbol stringss ∈ AZ by the rule π(x0) = s, iff xt = Φt

r(x0) ∈Ait , t ∈ Z. Thus, π maps initial conditions x0 inthe state space onto symbolic strings regarded ascoarse-grained trajectories starting at x0. By do-ing so, the flow Φr is mapped onto the left shiftσ, as in the case of the Bernoulli map. The shiftis hence a map σ : AZ → AZ acting accordingσ(. . . ai−1

ai0ai1ai2 . . .) = . . . ai−1ai0 ai1ai2 . . . where

the hat denotes the current state under observa-tion. The map π mediates between the quantitativestates x ∈ X ⊂ R

m and the states of the symbolicdynamics s ∈ AZ by

π(Φr(xt)) = σ(π(xt)) . (17)

The map π might not be invertible. If π is invertiblethe partition is called generating and every string ofsymbols corresponds to exactly one initial conditiongenerating it [Beck & Schlogl, 1993].

Nevertheless, if π is not invertible, we can applythe map π−1 at subsets of AZ namely at strings offinite length, looking for their preimages in X. Inorder to do this we generalize the notion of cylindersets. Let t ∈ Z, n ∈ N and ai1 , . . . , ain ∈ A. The set

[ai1 , . . . , ain ]t = {s ∈ AZ|st+k−1 = aik ,

k = 1, . . . , n} (18)

is called n-cylinder at time t. For further referencessee [Badii & Politi, 1997; Beck & Schlogl, 1993;beim Graben et al., 2000b]. Since cylinders are sub-sets of AZ of infinite strings coinciding in a sequenceof time points {t, t + 1, . . . , t + n − 1}, we candetermine their preimages under the deterministicdynamics Φr

π−1([ai1 , . . . , ain ]t) =

n−1⋂

k=0

Φ−k(Aik+1) . (19)

This is almost the same formula as Eq. (16).As for the Bernoulli map any general symbolic

dynamics can be mapped back onto a quantitativedynamical system by performing a b-adic expan-sion. To archive this the symbols of the alphabetA must be encoded by integers. If I is the cardinal-ity of the alphabet we need I digits 0, 1, 2, . . . , I−1representing the letters a1, a2, a3, . . . , aI . We knowthe assignment g(ai) = i − 1 as a Godel encod-ing from Sec. 2.3. Given a finite or bi-infinite sym-bolic sequence s = . . . ai−1

ai0ai1ai2 . . . where the hatdenotes the current state again, we expand the left-half sequence s− = . . . ai−3

ai−2ai−1

ai0 into the I-adic fraction x = . . . + g(ai−3

)I−4 + g(ai−2)I−3 +

g(ai−1)I−2 + g(ai0)I

−1 and the right-half sequences+ = ai1ai2ai3 . . . into the I-adic fraction y =g(ai1)I

−1 + g(ai2)I−2 + g(ai3)I

−3 . . . . The sequences is thus mapped onto a pair (x, y)

x =∞∑

k=0

g(ai−k)I−k−1 (20)

y =

∞∑

k=1

g(aik)I−k (21)

of real numbers lying in the unit square [0, 1]×[0, 1].Furthermore, as cylinder sets of the Bernoulli

map were represented by subintervals of the unitinterval, we shall demonstrate that cylinder sets ofan arbitrary symbolic dynamics [Eq. (18)] are rect-angles in the unit square. To this aim let us con-sider a cylinder of length n + l + 1 at time −l:[ai1 , . . . , ain+l+1

]−l = {s ∈ AZ|s−l = ai1 , s−l+1 =ai2 , . . . , s0 = ail+1

, s1 = ail+2, . . . , sn = ain+l+1

}.We split the elements of this set into two half-sequences: s− = . . . s−ls−l+1 . . . s0 and s+ =s1s2 . . . sn . . . . According to our discussion inSec. 2.3 we therefore obtain an interval of real num-bers for the expansion of s− and s+, respectively.The infimum of the interval constituted by the ex-pansions of s− is

x1 =

−l∑

i=0

g(si)Ii−1 . (22)

For the supremum of this interval we find

x2 = x1 +

∞∑

i=l+2

(I − 1)I−i = x1 + I−l−1 (23)

where we have made use of the limit of geometricseries thus proving Eq. (7). Accordingly, we obtainthe interval [y1, y1 + I−n] with y1 =

∑ni=1 g(si)I

−i

for the right-half sequence s+. Hence, the cylinderset [ai1 , . . . , ain+l+1

]−l is represented by the rectan-gle [x1, x2] × [y1, y2] [cf. Eq. (8)].

3.3. Local ambiguity of context-free

grammars

In this section, we wish to investigate the propertiesof locally ambiguous grammars as informally pre-sented in Sec. 2.1 in terms of some standard notionsfrom the theory of formal languages. In particu-lar, we are interested in constructing a determin-istic top-down recognizer from an arbitrary locallyunambiguous grammar, as described in Sec. 2.2.

Page 14: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

612 P. b. Graben et al.

We will then go on to show that any context-freegrammar may be partitioned into a finite set of lo-cally unambiguous types of grammar, as suggestedin Sec. 2.1. We begin by presenting some basic con-cepts from the theory of formal languages.

A context-free grammar G is defined as a4-tuple, G = (N, T, P, S), where

(24) N is a finite set of n nonterminal symbols,(25) T is a finite set of m terminal symbols such

that N ∩ T = ∅,(26) P ⊆ N → (N ∪T )∗ is a finite set of production

rules,4 and(27) S ∈ N is the distinguished start symbol.

Below, unless otherwise specified, we will assumewe are dealing with a context-free grammar G =(N, T, P, S).

A pushdown automaton (or “PDA”) is a 7-tupleM = (Q, T, Γ, δ, q0, Z0, F ), where

(28) Q is a finite set of states,(29) T is a finite input alphabet,(30) Γ is a finite stack alphabet,(31) δ: Q × (T ∪ {ε}) × Γ → 2(Q×Γ∗) is a partial

transition function,(32) q0 ∈ Q is the distinguished initial state,(33) Z0 ∈ Γ is the initial stack symbol, and(34) F ⊆ Q is the set of final states.

In this paper, we are concerned with pushdown au-tomata with no final states: F = ∅. Such automataare said to accept by empty stack. The operationof such automata is described in more detail inSec. 3.4. Given a context-free grammar G, we canconstruct a pushdown automaton which accepts thelanguage generated by G by simulating rule expan-sions. Such an automaton is called a top-down rec-ognizer. Formally, if G = (N, T, P, S) is a context-free grammar, then the pushdown automaton M =({q}, T, (N∪T ), δ, q, S, ∅) is a top-down recognizerfor G, where δ is defined as follows for all A ∈ Nand all a ∈ T :

δ(q, ε, A) = {(q, α)|A → α ∈ P} (“ε move”)

(35)

δ(q, a, a) = {(q, ε)} (“non-ε move”) . (36)

See [Hopcroft & Ullman, 1969; Aho & Ullman, 1972]for a more complete discussion.

Our concern in this paper is with determinis-tic pushdown automata. Informally, a deterministicautomaton is simply one whose transition functionallows at most one move for each possible config-uration.5 Formally, a PDA M as above is said tobe deterministic (a “DPDA”) if for each q ∈ Qand each Z ∈ Γ, either (37) or (38) holds for alla ∈ T :

δ(q, ε, Z) = ∅ and δ(q, a, Z)

contains at most one element , (37)

δ(q, a, Z) = ∅ and δ(q, ε, Z)

contains at most one element . (38)

When dealing with deterministic PDAs, the transi-tion function may be redefined: δ: Q × (T ∪ {ε}) ×Γ → Q × Γ∗.

We now turn our attention to locally ambigu-ous grammars as informally presented in Sec. 2.1.Formally, we say for a context-free grammar G =(N, T, P, S) and a nonterminal symbol A ∈ N , thatG is locally ambiguous with respect to A if and onlyif there exist productions A → β, A → γ ∈ P suchthat β 6= γ. We say that G is locally ambiguous justin case there is some nonterminal symbol A ∈ Nsuch that G is locally ambiguous with respect to A.G is said to be locally unambiguous just in case Gis not locally ambiguous.

It should be noted that our definition of “localambiguity” differs from other notions of ambigu-ity to be found elsewhere in the literature. Whenconsidered in terms of top-down recognizers how-ever, our notion can be seen to resemble the con-cept of local ambiguity for LR parsers with graph-structured stacks described by Tomita [1987]. Tra-ditionally though, “ambiguity” is defined globallyfor a grammar as that property which holds whensome sentence of the grammar has more than oneleft derivation with respect to that grammar (seeSec. 1). Clearly, a grammar can only be ambigu-ous in the traditional sense if it is also locally am-biguous. The reverse does not hold, since there areglobally unambiguous grammars which are indeedlocally ambiguous, such as our toy-grammar fromSec. 2.

Given our notion of local ambiguity, it is easyto see that every locally unambiguous grammarG = (N, T, P, S) has a deterministic top-down

4Here and elsewhere in this section, we use the Kleene star to indicate the reflexive and transitive closure of the alphabetic

concatenation operation; thus V ∗ is the set of all strings over the alphabet V , including the empty string ε.5A configuration for a pushdown automaton is a triple (q, w, α) ∈ Q × T ∗ × Γ∗.

Page 15: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 613

recognizer M = ({q}, T, (N ∪ T ), δ, q, S, ∅). OnlyEq. (35) in the construction given above is capableof introducing nondeterminism into M . For a locallyunambiguous grammar G, there is at most one ruleexpanding each nonterminal symbol A ∈ N , so thatδ(q, ε, A) has at most one element, and M is there-fore deterministic.

Having shown that all locally unambiguousgrammars have deterministic top-down recogniz-ers, we now present a method by which an arbi-trary context-free grammar may be broken downinto a finite set of locally unambiguous grammars.Let G = (N, T, P, S) be a context-free gram-mar. We call a finite set of context-free grammarsG = {G1, . . . , Gp} a partitioning of G just in caseP =

⋃pi=1 Pi and Gi = (N, T, Pi, S), for 1 ≤ i ≤ p.

If G is a partitioning of some context-free gram-mar G, we call G a partitioned context-free gram-mar, and say that G is the synthesis grammar ofG. We wish to use local ambiguity as a criterion onwhich to base grammar partitionings. For this pur-pose, we first define a local disambiguation functionLocDis, which produces for a context-free grammarG = (N, T, P, S) and a nonterminal symbol A ∈ Nunique partitioning G of G such that each elementof G is locally unambiguous with respect to A. LetPA = {A → α|A → α ∈ P} and let PA = P − PA,then

LocDis(A, G)

=

{G} if PA = ∅⋃

r∈PA

{(N, T, PA ∪ {r}, S)} otherwise .

(39)

To see that the elements of LocDis(A, G) arelocally unambiguous with respect to A, we mustconsider several cases. If G is not locally ambigu-ous with respect to A, then either P contains noexpansions for A, in which case PA = ∅, or Pcontains exactly one rule A → α expanding A,in which case PA = {A → α}; in both of thesecases, LocDis(A, G) = {G}. Otherwise, G is lo-cally ambiguous with respect to A, so there arerules A → α1, . . . , A → αq ∈ P , and PA = {A →α1, . . . , A → αq}. Since PA is nonempty, each G′ ∈LocDis(A, G) contains exactly one rule from PA,and is therefore locally unambiguous with respectto A.

We can extend the function LocDis to handlea partitioned context-free grammar G as its second

argument in the obvious manner:

LocDis(A, G) =⋃

G′∈G

LocDis(A, G′) . (40)

By iteratively applying the extended local disam-biguation function LocDis, it is possible to par-tition an arbitrary context-free grammar G =(N, T, P, S) into a finite set G of context-free gram-mars such that each G′ ∈ G is locally unambigu-ous. We say in this case that G is a locally unam-biguous partitioning of G. Let {A1, . . . , An} = Nbe an enumeration of the nonterminal symbols ofthe grammar G, and construct the partitioning Gn

of G:

G0 = {G}

Gi+1 = LocDis(Ai+1, Gi) .

As shown above, each iteration in the construc-tion of Gn removes the local ambiguities withrespect to a single nonterminal symbol fromthe partitioned grammar produced by the pre-ceding iteration. Further, no new local ambi-guities are introduced by any application ofLocDis, since LocDis is monotonic in the sensethat it can only remove rules from an argu-ment grammar. Since G has exactly n nonter-minals, every element of Gn must be locallyunambiguous.

We have shown that every context-free gram-mar can be partitioned into a finite set of locallyunambiguous grammars, and that each element ofthis set is associated with a deterministic top-downrecognizer. Although we will concern ourselves inthe rest of this paper with modeling the recog-nizers associated with locally unambiguous gram-mars, the use of a control parameter as describedin Sec. 2.5 should allow the extension of our model-ing strategy to the recognition of arbitrary context-free languages by means of the locally unambiguouspartitionings of the respective grammars.

3.4. Symbolic dynamics of

automata

In order to describe Turing machines as dynam-ical systems, Moore [1990, 1991b] introduced theconcept of generalized shifts. Like the shifts dis-cussed in Sec. 3.2 the generalized shifts are mapsdefined on the set of bi-infinite symbolic stringsAZ. Let s = . . . ai−1

ai0ai1ai2 . . . be such a sequencewhere the hat denotes the current state. In Moore’s

Page 16: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

614 P. b. Graben et al.

construction the hat indicates the tape position ofthe head of a Turing machine. A generalized shift ischaracterized by a range of d symbols called domainof dependence. The generalized shift acts on thisword w of length d by first replacing it by a furtherword w′ of length d′ and then by performing a shiftk symbols to the left or to the right [Moore, 1990,1991b; Badii & Politi, 1997]. Moore has proven thatthese systems are computationally equivalent to anyTuring machine.

The b-adic expansion algorithm of symbolic se-quences into points of the unit square [Eqs. (20)and (21)] can be applied to states of the generalizedshift in quite the same manner. It is therefore possi-ble to map any Turing machine onto a time discretenonlinear dynamics living at the unit square. Thecorresponding map has been shown to be piecewiseaffine linear just as the Bernoulli map discussed inSec. 3.1 [Moore, 1990, 1991b].

In this section we shall justify our construc-tion of dynamical systems from top-down parsersfor the processing of context-free languages sup-plied in Sec. 2.4. Because formal languages acceptedby pushdown automata belong to a lower class inthe Chomsky hierarchy than recursively enumerablelanguages accepted by Turing machines [Hopcroft &Ullmann, 1979; Badii & Politi, 1997], any pushdownautomaton can be simulated by a Turing machine.Hence, Moore’s proof holds for pushdown automatatoo.6

Here, we shall assume G to be locally un-ambiguous, belonging, e.g. to a locally unambigu-ous partitioning of some locally ambiguous gram-mar which can be constructed by the algorithmdescribed in Sec. 3.3. Then, as we have also demon-strated in Sec. 3.3, the language L(G) can be pro-cessed by a deterministic top-down recognizer M =({q}, T, Γ, δ, q, S, ∅) with Γ = N∪T . The state de-scriptions of this automaton are provided by the ac-tual pairs (γ, w) ∈ Γ∗×T ∗ where γ = γi0γi1 . . . γik−1

is the content of the parser’s stack while w =wj1wj2 . . . wjl

is some finite word at the input tape.From the definition of the transition function δ fol-lows that the automaton has access only to the topof the stack γi0 and to the first symbol of the inputtape wj1 at each instance of time. Thus, we definea map τ : Γ×T → Γ∗×T ∗ acting on actual pairs ofthe topmost symbols of stack Z ∈ Γ and input tapea ∈ T , respectively, by the transition function δ

τ(Z, a) = (γ, a) : δ(q, ε, Z) = (q, γ) ,for ε moves

τ(a, a) = (ε, ε) : δ(q, a, a) = (q, ε) ,for non-ε moves ,

(41)

where γ = γi0 . . . γik−1if Z ∈ N and if P contains a

rule Z → γ. We see that the ε-moves correspondto the predict operations while the non-ε movescorrespond to the attach operations described inSec. 2.2.

In the following we establish the relation be-tween deterministic top-down parsers and piece-wise affine linear maps at the unit square thathas already been used in Sec. 2.4 for the construc-tion of our toy-model of language processing. Forthis aim we shall first reorder the content of thestack: γ′

i−k= γik , obtaining the reversed sequence

γ′ = γ′i−k+1

. . . γ′i−1

γ′i0

. Next, we concatenate thetransformed stack with the input tape yielding atwo-sided (but finite) string

s = γ′i−k+1

. . . γ′i−1

γ′i0wj1wj2 . . . wjl

. (42)

Then, we could apply the b-adic expansion on theleft- and right-half sequences s− and s+ after intro-ducing a Godel encoding of the set Γ = N ∪ T inorder to map the actual pair to a point in the unitsquare. But this is not what we will do. To avoidgaps in the unit square we decided to use two dif-ferent encodings: one, g1, of the set Γ for the stackand another, g2, of the terminals T for the inputtape. Thus, we perform a b1 = (n + m)-adic expan-sion of the stack and a b2 = m-adic expansion forthe input tape. This yields a partition of the unitsquare into rectangles of equal size (n+m)−1×m−1

according to Eqs. (20) and (21). The two b-adic ex-pansions lead to a point (x, y) in the unit squarewith coordinates

x =k−1∑

h=0

g1(γ′i−h

)(m + n)−h−1 (43)

y =

l∑

h=1

g2(wjh)m−h . (44)

By this construction the most significant digitsγ′

i0and wj1 determine the rectangle where the

point (x, y) will be found. We obtained thereforea coarse-grained description of the actual pairsrepresented by states of a quantitative dynamicalsystem.

6Moore has also proven that pushdown automata are equivalent to nondeterministic one-sided generalized shifts [Moore, 1991a,1991b]. However we do not pursue this approach here since we are interested in deterministic dynamical systems.

Page 17: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 615

To proceed further, we need some considera-tion of the empty word ε. In formal language theorythe set A∗ of all strings of words of finite lengthformed by letters of the alphabet A constitutes asemi-group with respect to the concatenation “·” ofwords: u · (v ·w) = (u ·v) ·w for all u, v, w ∈ A∗ andw · ε = ε · w = w. The empty word ε is the neutralelement of this semi-group. In order to complete ourconstruction we have to determine the image of εin the unit interval when we consider one-sided se-quences s ∈ AN. This can be done by using Eq. (19)which states the relation between cylinder sets oflength n and their preimages in the phase space of adynamical system. Decomposing the left-hand sideof Eq. (19) into concatenation and the right-handside into intersection factors leads to

π−1([(ai1 , . . . , aim) · (aim+1, . . . , ain)]1)

=

(

m−1⋂

k=0

Φ−k(Aik+1)

)

(

n−1⋂

k=m

Φ−k(Aik+1)

)

(45)

or in shorthand notation π−1(u · v) = π−1(u) ∩π−1(v) for words u, v regarded as cylinder sets.Hence, π−1 is a semi-group homomorphism map-ping the concatenation of words onto the intersec-tion of sets. It is well known from set theory that theneutral element with respect to set intersection isthe base set. In our case, this is just the unit interval[0, 1] which has therefore to be identified with theempty word. Correspondingly, the goal of the top-down parser, the actual pair (ε, ε) is representedby the whole unit square [0, 1]2. A further conse-quence of this reasoning is that the content of thestack as well as the content of the input tape mightbe considered as one-sided infinite strings that arecommitted to finite words at the very beginning aswe have argued in Sec. 2.4. Due to this interpre-tation we allow uncertainty about forthcoming andalready processed input, respectively. Now, we areable to extend the symbol sequence from Eq. (42)towards plus and minus infinity:

s = . . . γ′i−k+1

. . . γ′i−1

γ′i0wj1wj2 . . . wjl

. . . . (46)

The actual pair (γ, w) of the parser comes out tobe the set of all bi-infinite sequences s coincidingin the symbols γ ′

i−k+1. . . γ′

i−1γ′

i0at the left-half side

and coinciding in the symbols wj1wj2 . . . wjlat the

right-half side, that is — a cylinder set. Using the b-adic expansions [Eqs. (43) and (44)] of the parser’sstate, any actual pair is mapped onto a rectangle inthe unit square.

Our remaining job is to establish the parsers dy-namics by a generalized shift and finally by a piece-wise affine linear map at the unit square. This isachieved by the definition of the map τ introducedby Eq. (41). Since τ acts only on actual pairs of sym-bols (Z, a) and not of strings, the Godel numbersof these symbols provide the domains of definitionin Moore’s construction with d = 2. Considering Zand a as the most significant digits of the b-adicrepresentations of stack and input tape, they de-fine a partition of the unit square into basic rectan-gles with labels i = g1(Z), j = g2(a). We thereforeconstruct a map Φ : [0, 1]2 → [0, 1]2 which isaffine linear at these rectangles (i, j) thus yieldingEq. (12).

4. Discussion

We have presented a formal approach for mod-eling language processing by nonlinear dynamicalsystems. This has been illustrated by a toy-modelof the psycholinguistic experiment using event-related brain potentials and event-related cylinderentropies reported by Frisch et al. [2004]. In this ex-periment, it was tested how the syntactic ambiguityof a pronoun in German is resolved in the courseof sentence processing. For a proper discussion, seeFrisch et al. [2004]. The construction of our modelcomprises eight steps: (1) describing the stimulusmaterial of the experiment by a rather crude for-mal language, hereby using the grammatical roles(subject and object) of noun phrases; (2) represent-ing the cognitive conflicts by a locally ambiguouscontext-free grammar generating this language;(3) decomposing the grammar into unambiguousparts, with each subgrammar being interpreted asone processing strategy; (4) constructing appro-priate pushdown automata as recognizers of thesesublanguages, the parsing process is represented bya “trajectory” of actual pairs; (5) employing a Godelencoding on the stack and the input tape of the au-tomata separately, thus mapping the parser onto apiecewise affine linear function at the unit squaredue to Moore’s proof [Moore, 1990]; (6) mergingthe different parsing maps corresponding to thedifferent processing strategies into one dynamicalsystem depending on a control parameter, and in-cluding “repair” maps for intermediate values ofthis; (7) preparing randomly an ensemble of pointsin the unit square within the initial actual pair ofthe parser as initial conditions, iterating the map forobtaining sets of trajectories that represent further

Page 18: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

616 P. b. Graben et al.

states of the parser, invariant sets correspond togarden path interpretations, then bifurcations serveas conflict resolutions; (8) determining the entropyof the parsing states with respect to a measure-ment partition as a model of event-related cylinderentropies.

The construction and formal justification of themodel essentially relies on findings of computationallinguistics, automata theory and dynamical systemtheory, especially on symbolic dynamics. Of course,we are fully aware that our model cannot seriouslyexplain findings from psycholinguistics or cognitivescience — as yet. There are some problems and dif-ficulties on the one hand, but on the other hand,given one initial success, our modeling approachmight be worth pursuing further. We shall discusssome of these issues and their possible solutions inthe concluding paragraphs.

Firstly, we address the psycholinguistic require-ments of more realistic modeling. One point isthat context-free grammars are not well suited todescribe natural languages [Shieber, 1985]. How-ever, natural languages share many properties withcontext-free languages, chiefly that they can be de-scribed by hierarchical tree structures (see [Saddy& Uriagereka, 2004] and [Frisch et al., 2004]). Thus,even if natural languages were not context-free, thismight hold for certain subsets of speech. The treesdiscussed by Frisch et al. [2004] serve as an goodexample: they describe the linguistic properties ofthe stimulus material of the ERP experiment in theframework of Chomsky’s Government and Binding(GB) theory [Haegeman, 1994] that is also com-monly used by researchers in the field of psycholin-guistics. Nevertheless, once the trees have beenwritten down, one could forget everything aboutGB theory and sketch a context-free toy-grammarusing only the labels on the nodes of GB treesto identify the production rules of the grammar.This would be the first attempt at the construc-tion of a slightly more realistic model which we areworking on.

We have used very simple parsing devices,namely deterministic top-down recognizers. Thoughthere is linguistic evidence that the human parser isdeterministic (otherwise garden path effects would

not appear), it does not pursue a top-down strategy,but rather follows a left-corner strategy7 [Hemforth,1993]. Staudacher [1990] has shown that the psy-cholinguistically motivated Marcus-parser [Marcus,1980] is in fact a deterministic left-corner parser. Itis therefore desirable to consider these automata forappropriate modeling.

In Sec. 3.4 we pointed out that Moore’sproof of the formal equivalence of automata andnonlinear dynamical systems at the unit squareholds not only for simple top-down parsers butfor any Turing machine. Adopting the Church–Turing thesis that any possible computation canbe achieved by a Turing machine [Hopcroft &Ullman, 1979; Hofstadter, 1979], and further adopt-ing that parsing is computation, we conclude thatthe model’s present lack of cognitive appropri-ateness is not a fundamental objection, becauseone could imagine a natural language parser em-ulated by a Turing machine which can then bemapped onto a dynamical system using Moore’sconstruction. However, Turing machines are not re-ally interesting for computational linguistics. Onereason for this is their freely accessible mem-ory tape, whereas cognitive scientists are dealingwith limited capacities of the human computa-tional system [Mitchell, 1994]. Good candidates fornatural language grammars seem to be, e.g. theso-called extended right linear indexed grammars[Michaelis & Wartena, 1999] which allow for de-pendencies across the structure trees. Although thelanguages generated by such grammars do belongto a higher class of the Chomsky hierarchy,8 theycould be processed by automata with less powerthan Turing machines since they are highly re-stricted context-sensitive languages [Michaelis &Wartena, 1999] that must be processed using tape-limited automata. Provided the state descriptionsand the machine tables of these automata wereavailable, one could use the idea of Moore’s proofagain to build a nonlinear dynamics from suchautomata. We are currently investigating thesesystems and their relations to Moore’s generalizedshifts.

From a psycholinguistic perspective, there isevidence that the human parser acknowledges the

7In this paper we have dealt only with predicting top-down parsers. There are other parsing strategies known, such as bottom-

up parsing that is data driven or even left-corner parsing which is a mixture of a top-down and a bottom-up parser [Aho &Ullman, 1972].8The Chomsky hierarchy is a system of formal languages classified by the computational complexity of the automata requiredto process them [Hopcroft & Ullmann, 1979].

Page 19: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 617

presence of syntactic alternatives, in that syntac-tically ambiguous arguments take longer to readthan their unambiguous counterparts [Schlesewskyet al., 2000]. In the ERP, this enhanced cost of pro-cessing ambiguous arguments is reflected in a P600component [Frisch et al., 2002]. This finding is cru-cial for theories of language processing insofar asit does not support serial parsing, that is, the as-sumption that the parser initially computes onlythe structurally simplest analysis independently ofany alternative reading (cf. [Mitchell, 1994]). Thepresence of an alternative reading should thereforenot induce enhanced processing cost at the ambigu-ous items itself, but should come into play only ifthe initial preference is incompatible with forthcom-ing, unambiguous information. That an ambigu-ous argument is more difficult to process than anunambiguous one shows that the parser takes thepossibility of several possible analysis into accountfrom the beginning. However, it was consistentlyfound that later, namely on the disambiguating el-ement, not all continuations are equally easy toprocess, but that disambiguation towards object-before-subject are more difficult to process com-pared to subject-before-object (see above; [Frischet al., 2002, 2004; Schlesewsky et al., 2000]). Thisshows that the ambiguity is of course acknowledgedbut that, after that, the parser prefers one alterna-tive, here, subject-before-object, to the expense ofothers.

The paper of Frisch et al. [2004] tested only theeffects on the disambiguating element, not in theambiguous region. Nevertheless, the parsing devicedeveloped in the present paper should be able toaccount for the P600 response due to a syntacticambiguity found by Frisch et al. [2002] in order toapproach psychological reality. How could this beaccomplished? In the model presented here, a cog-nitive control unit intervenes into the parsing dy-namics by tuning the control parameter in orderto account for which rules of the toy-grammar arechosen initially or for reanalysis. This interventiondiffers from the automatic processes of determinis-tic parsing when a particular rule has to be appliedin order to predict the next word. Therefore, theintervention must be regarded as more controlledthan parsing. In the case of an unambiguous argu-ment, tuning the control parameter is cognitivelyless costly (since there is only one possibility) com-pared to an ambiguous argument. It is plausiblethat in the latter case, the cognitive control unit

acknowledges that more than one possibilities existand that a choice has to be made. Assuming thatthis choice consumes additional cognitive resourcescould account for the finding of the P600 ambiguityeffect in the study of Frisch et al. [2002], seeing thatthe P600 was shown to reflect controlled processing[Hahne & Friederici, 1999].

This last assumption points to another issuethat our approach to modeling highlights, that is,how to associate behavioral makers with changes ofdifferences in cognitive processing load. It is the es-tablished tradition in cognitive psychology and re-lated disciplines that time equals cost. The moredifficult or complex a process is the more time it willtake. In the ERP literature the elicitation of com-ponents is interpreted as reflecting differential pro-cessing loads between one experimental conditionand another. In our dynamical model of parsing,transitions of the parser are mapped onto changesin phase space as determined by the calculation ofcylinder entropies. For the sake of the toy-grammarwe have restricted ourselves to the change in phasespace associated with the P600 voltage averagedcomponent elicited under so-called reanalysis con-ditions. However, the P600 and related positivi-ties can be elicited by a variety of experimentalconditions that are not obviously closely relatedin processing terms [Frisch et al., 2002]. A pre-cise understanding of the relation between the ERPcomponents and the variety of conditions that canelicit them is still an active research goal. Interest-ingly, the calculation of cylinder entropies associ-ated with experimental conditions standardly revealmore changes in phase space than there are elicitedvoltage averaged components. That is to say thatchanges in phase space that do not correspond toERP components do occur. At the present thereis no good theory of cognitive behavior that pre-dicts such contrasts. Saddy and beim Graben [2002]note that the observed phase space changes occurin the time ranges that match well with Friederici’smodel [Friederici, 1999] and offer a broad interpre-tation of such changes in system level terms. Oneobvious route to pursue in hopes of gaining insightinto the distribution and nature of the entire suiteERP components and phase space alternations as-sociated with psycholinguistic manipulations is thedevelopment and extension of computational mod-els such as those presented here.

The parsers we are dealing with are simplerecognizers. Such automata reach their accepting

Page 20: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

618 P. b. Graben et al.

state when both, the stack and the input are com-pletely discarded. Unfortunately, one loses all in-formation about the processed input and about itsstructural representation. This is a problem withour model, insofar as a structural representation ofthe input is necessary for further linguistic and cog-nitive analysis, e.g. for semantic and pragmatic un-derstanding. A structural representation is also in-dispensable for psycholinguistic modeling since re-analysis implies changes along the sentence struc-tures that have been built up during the parsingprocess. We are developing such automata gener-ating and storing a left-derivation of the input inthe stack memory. In order to achieve this, one hasto introduce a “stack pointer” with random accessto a limited part of the stack. This will lead togeneralized shifts with larger domains of definitionin the concatenation of stack and input tape. Wewill also be able to address the “costs of reanal-ysis” in such models when the repair maps needaccess to deeper parts of the structures in the stackthus increasing the complexity of the flow at theunit square. And finally, such devices would agreemuch better with the essential ideas of symbolic dy-namics: namely that dynamics generates informa-tion (expressed by the Kolmogorov–Sinai entropyof the system) during transient evolution. Thisis also compatible with the dynamic understand-ing of information in the sense of von Weizsacker[1974, p. 352]: “Information is only what gener-ates information” (see also the contribution of Jirsa[2004]).

The parameterized maps emulating the parsersare deterministic dynamical systems whose behav-ior is completely determined by the initial condi-tions. These encode, by construction, the stringsto be processed. Hence, any garden path anal-ysis caused by applying the wrong strategy (aninappropriate value of the control parameter) isprincipally predictable at the very beginning of thedynamics. This is not consistent with psycholinguis-tic reasoning. It is also inconsistent with the designof psycholinguistic experiments where stimuli arepresented item-for-item. We are working on a modelthat better fits the reality by splitting the inputtape into a space limited “working memory” belong-ing to the automaton and a second part belongingto the “outer world” that is not accessible by themodel. From this outer world tape new symbols areread in by the system after each attach operation.Thus, the experimental stimulation is simulated by

a forced dynamical system which is disturbed in itsevolution by the external input.

There are many other questions raised by ourapproach. From the natural science point of view:How might a time discrete piecewise affine linearmap at the unit square be implemented by livingneural networks? How to construe the connectionto the measured time continuous ERP data? Letus outline a possible solution we are still pursu-ing. The discrete time map at the unit squarecan be regarded as a Poincare section of a three-dimensional flow. Then it follows from the proper-ties of the parsing map that its invariant sets aresets of limit cycle trajectories. That is, the appro-priate continuous time dynamics of the embeddedparser are nonlinear oscillators [Guckenheimer &Holmes, 1983]. And it is well known from neuralmodeling that limit cycle oscillators can indeed beconstituted by neural networks [Wilson & Cowan,1972; Freeman, 2000; Nunez, 1995]. Provided thatsuch an oscillator model of the parsing dynam-ics is possible, it would also entail an interfacefor the analysis of real time EEG and ERP mea-surements we are investigating by using symbolicdynamics.

There is evidence i.e. from the results of Hutt[2004] that ERP components are fixed points of thebrain dynamics. If this were generally true, it shouldbe possible to find a two-dimensional embedding ofERP time series where ERP components are sepa-rated clusters in the phase space. Then one mightattempt to capture these clusters by the cells ofa partition and to examine the symbolic dynam-ics thus obtained. The challenging question is thenwhether this symbolic dynamics can be interpretedin terms of automata theory.

Although our parsing dynamics is a functionthat maps points in the unit square onto otherpoints in the unit square, we considered ensemblesof these states in order to interpret them as ac-tual pairs of the parser. This appears to be cogni-tively and physiologically not very plausible sinceone would expect that a brain state is a pointin its phase space and not an ensemble of ran-domly distributed states. A possible solution forthis problem could be provided by stochastic dy-namics: One prepares the parser with only one ran-domly chosen initial condition belonging to thatrectangle which corresponds to the first actual pair.That is, one initializes the dynamics with a delta-shaped probability distribution function. Due to

Page 21: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 619

the impact of the stochastic forces, the time evolu-tion of the initial probability distribution functionis governed by a Fokker–Planck equation. Then,the initial probability distribution will spread outover the unit square when the system evolves thusleading to an ensemble interpretation of the dy-namics as desired. However, one has to ensure thatthe stochastic forces are agreeable to the parti-tion of the phase space for maintaining the parsinginterpretation.

Finally let us address a philosophical aspect ofour model. In Sec. 2.6 we computed the informa-tion content of the states of the parser with re-spect to an arbitrary measurement partition. Thatis a crucial point of symbolic dynamics and in-formation theory: there are many ways of parti-tioning the phase space of a dynamical system(unless a generating partition exists); and as C.F. von Weizsacker states: “An ‘absolute’ notionof information does not make any sense; infor-mation exists only with respect to another no-tion, namely relative to two ‘semantic levels’ ” [vonWeizsacker, 1988, p. 172]. “Semantic levels” in thesense of von Weizsacker are, e.g. macro- and mi-crostates of a dynamical system, where microstatesare considered as points in the phase space, whereasmacrostates are equivalence classes of points allleading to the same measurement of an observ-able, such as the energy for the microcanonical en-semble. That is, macrostates provide a partition ofthe system’s state space and hence a symbolic dy-namics. On the other hand, two partitions of thestate space may be complementary in the senseof Bohr’s Principle of Complementarity in quan-tum physics [Bohr, 1948; Atmanspacher et al., 2002;Atmanspacher, 2003]. The relative arbitrariness ofa symbolic dynamics has a strong impact on the in-terpretation of our model. Consider once more theunit square in Fig. 1 without knowing its partitioninto the six rectangles which establish the domainsof definition of the parser’s map due to the Godelencoding. By simulating the deterministic dynamicsof the system there are only clouds of points hop-ping through the unit square at the lower “semanticlevel”. However, after introducing the particu-lar partition we have used in our construction,the dynamics of the system becomes interpretableas language processing at the higher “semanticlevel”, i.e. as cognitive computation. We shall dis-cuss this issue more thoroughly in a forthcomingpaper.

Acknowledgments

This work has been supported by the DeutscheForschungsgemeinschaft (research group “conflict-ing rules in cognitive systems”). We want to thankPeter Staudacher, Jens Michaelis, Gunter Troll,Harald Atmanspacher, Udo Schwarz and ViktorJirsa for stimulating and helpful discussions. BryanJurish acknowledges support from the project“Collocations in the German Language of the 20thCentury” of the Berlin-Brandenburg Academy ofSciences.

References

Aho, A. V. & Ullman, J. D. [1972] The Theory ofParsing, Translation and Compiling Vol. I: Parsing.(Prentice Hall Englewood Cliffs, NJ).

Allen, J. [1987] Natural Language Understanding.Benjamin/Cummings Series in Computer Science(Benjamin/Cummings Publishing Company, MenloPark, CA).

Anosov, D. V. & Arnol’d, V. I. (eds.) [1988] Dynami-cal Systems Encyclopaedia of Math. Sciences, Vol. 1(Springer, Berlin).

Atmanspacher, H., Romer, H. & Walach, H. [2002]“Weak quantum theory: Complementarity and entan-glement in physics and beyond,” Found. Phys. 32,379–406.

Atmanspacher, H. [2003] “Mind and matter as asymp-totically disjoint, inequivalent representations withbroken time-reversal symmetry,” BioSystems 68,19–30.

Basar, E. [1980] EEG–Brain Dynamics. Relations be-tween EEG and Brain Evoked Potentials (Elsevier/North Holland Biomedical Press Amsterdam).

Badii, R. & Politi, A. [1997] Complexity. HierarchicalStructures and Scaling in Physics, Cambridge Non-linear Science Series, Vol. 6 (Cambridge UniversityPress Cambridge, UK).

Beck, C. & Schlogl, F. [1993] Thermodynamics ofChaotic Systems. An Introduction, Cambridge Non-linear Science Series, Vol. 4 (Cambridge UniversityPress Cambridge (UK)).

beim Graben, P., Liebscher, T. & Saddy, J. D. [2000a]“Parsing ambiguous context-free languages by dy-namical systems: Disambiguation and phase transi-tions in neural networks with evidence from event-related brain potentials (ERP),” in Learning toBehave, eds. Jokinen, K., Heylen, D. & Njiholt,A., Internalising Knowledge of TWLT 18, Vol. II(Enschede Universiteit Twente), pp. 119–135.

beim Graben, P., Saddy, J. D., Schlesewsky, M. &Kurths, J. [2000b] “Symbolic dynamics of event–related brain potentials,” Phys. Rev. E62, 5518–5541.

Page 22: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

620 P. b. Graben et al.

Bennett, C. H. [1982] “The thermodynamics of compu-tation — a review,” Int. J. Th. Phys. 21, 907–940.

Bohr, N. [1948] “On the notions of causality and com-plementarity,” Dialectica 2, 312–319.

Collet, P. & Eckmann, J.-P. [1980] Iterated Mapson the Interval as Dynamical Systems (BirkhauserBoston).

Crutchfield, J. P. [1994] “The calculi of emergence: Com-putation, dynamics and induction,” Physica D75,11–54.

Engbert, R., Scheffczyk, C., Krampe, R. T., Rosenblum,M., Kurths, J. & Kliegl, R. [1997] “Tempo-inducedtransitions in polyrhythmic hand movements,” Phys.Rev. E56, 5823–5833.

Fodor, J. D. & Ferreira, F. (eds.) [1999] Reanalyis inSentence Processing (Kluwer Dordrecht).

Freeman, W. J. [2000] Neurodynamics: An Explo-ration in Mesoscopic Brain Dynamics. Perspectivesin Neural Computing (Springer Verlag London).

Friederici, A. D. [1999] “The neurobiology of languagecomprehension,” Language Comprehension: A Bio-logical Perspective, ed. Friederici, A. D. (SpringerBerlin), pp. 265–304.

Frisch, S., beim Graben, P. & Schlesewsky, M. [2004]“Parallelizing grammatical functions: P600 and P345reflect different cost of reanalysis,” Int. J. Bifurcationand Chaos 14, 531–549.

Frisch, S., Schlesewsky, M., Saddy, D. & Alpermann, A.[2002] “The P600 as an indicator of syntactic ambi-guity,” Cognition 85, B83–B92.

Godel, K. [1931] “Uber formal unentscheidbare Satzeder principia mathematica und verwandter Sys-teme I,” Monatshefte fur Mathematik und Physik 38,173–198.

Guckenheimer, J. & Holmes, P. [1983] Nonlinear Oscil-lations, Dynamical Systems, and Bifurcations of Vec-tor Fields, Springer Series of Applied MathematicalSciences, Vol. 42 (Springer, NY).

Haegeman, L. [1994] Introduction to Goverment & Bind-ing Theory, Blackwell Textbooks in Linguistics, Vol. 1(Blackwell Publishers, Oxford).

Hahne, A. & Friederici, A. D. [1999] “Electrophysiologi-cal evidence for two steps in syntactic analysis: Earlyautomatic and late controlled processes,” J. Cogn.Neurosci. 11, 194–205.

Hemforth, B. [1993] Kognitives Parsing: Reprasentationund Verarbeitung kognitiven Wissens (Infix SanktAugustin).

Hofstadter, D. R. [1979] Godel, Escher, Bach: an EternalGolden Braid. (Basic Books, NY).

Hopcroft, J. E. & Ullman, J. D. [1969] Formal Lan-guages and Their Relation to Automata (Addison-Wesley Reading, MA).

Hopcroft, J. E. & Ullmann, J. D. [1979] Introductionto Automata Theory, Languages, and Computation(Addison-Wesley Menlo Park, California).

Hutt, A. [2004] “An analytical framework for modelingevoked and event-related potentials,” Int. J. Bifurca-tion and Chaos 14, 653–666.

Jirsa, V. K. [2004] “Information processing in brain andbehavior displayed in large-scale scalp topographiessuch as EEG and MEG,” Int. J. Bifurcation andChaos 14, 679–692.

Kawamoto, A. H. [1993] “Nonlinear dynamics in the res-olution of lexical ambiguity: A parallel distributedprocessing account,” J. Memory and Language 32,474–516.

Kelso, J. A. S., Bressler, S. L., Buchanan, S., DeGuz-man, G. C., Ding, M., Fuchs, A. & Holroyd, T. [1992]“A phase transition in human brain and behavior,”Phys. Lett. A196, 134–144.

Marcus, G. F. [2001] The Algebraic Mind. Integrat-ing Connectionism and Cognitive Science. Learning,Delvelopment, and Conceptual Change (MIT Press,Cambridge, MA).

Marcus, M. [1980] A Theory of Syntactic Recognition forNatural Language (MIT Press, Cambrigde, MA).

Michaelis, J. & Wartena, C. [1999] “LIGs with reducedderivation sets,” Constraints and Resources in Natu-ral Language Syntax and Semantics, eds. Bouma, G.,Kruijff, G.-J., Hinrichs, E. & Oehrle, R. T., Stud-ies in Constrained Based Lexicalism, Vol. II (CSLIPublications Stanford, CA), pp. 263–279.

Mitchell, D. C. [1994] “Sentence parsing,” Handbook ofPsycholinguistics, ed. Gernsbacher, M. A. (AcademicPress San Diego), pp. 375–409.

Moore, C. [1990] “Unpredictability and undecidability indynamical systems,” Phys. Rev. Lett. 64, 2354–2357.

Moore, C. [1991a] “Generalized one-sided shifts andmaps of the interval,” Nonlinearity 4, 727–745.

Moore, C. [1991b] “Generalized shifts: Unpredictabilityand undecidability in dynamical systems,” Nonlinear-ity 4, 199–230.

Nunez, P. L. (ed.) [1995] Neocortical Dynamics andHuman EEG Rhythms (Oxford University Press,NY).

Ott, E. [1993] Chaos in Dynamical Systems (CambridgeUniversity Press, NY).

Raczasek, J., Tuller, B., Shapiro, L. P., Case, P. & Kelso,S. [1999] “Categorization of ambiguous sentences as afunction of a changing prosodic parameter: A dynam-ical approach,” J. Psycholing. Res. 28, 367–393.

Saddy, D. & Uriagereka, J. [2004] “Measuring language,”Int. J. Bifurcation and Chaos 14, 383–404.

Saddy, J. D. & beim Graben, P. [2002] “Measuringthe neural dynamics of language comprehension pro-cesses,” Basic Fuctions of Language, Reading andReading Disorder, eds. Witruk, Friederici, A. D. &Lachmann, T. (Kluwer Academic Publishers Boston),pp. 41–60.

Schlesewsky, M., Fanselow, G., Kliegl, R. & Krems, J.[2000] “Preferences for grammatical functions in the

Page 23: LANGUAGE PROCESSING BY DYNAMICAL SYSTEMSkaskade.dwds.de/~moocow/mirror/pubs/beimGrabenJurishEA2004.pdf · Using a theorem proven by Moore, we map these automata onto discrete time

March 4, 2004 15:24 00932

Language Processing by Dynamical Systems 621

processing of locally ambigous wh-questions inGerman,” Cognitive Parsing in German, eds.Hemforth, B. & Konieczny, L. (Kluwer Dordrecht),pp. 65–93.

Schuster, H. G. [1989] Deterministic Chaos (VCHWeinheim).

Shannon, C. E. & Weaver, W. [1949] The MathematicalTheory of Communication (University of Illinois PressUrbana).

Shieber, S. M. [1985] “Evidence against the context-freeness of natural language,” Ling. Philos. 8,333–343.

Staudacher, P. [1990] “Ansatze und Probleme prinzip-ienorientierten Parsens,” Sprache und Wissen,eds. Felix, S. W., Kanngießer, S. & Rickheit, G.(Westdeutscher Verlag Opladen), pp. 151–189.

Tomita, M. [1987] “An efficient augmented-context-freeparsing algorithm,” Comput. Ling. 13, 31–46.

van Gelder, T. [1998] “The dynamical hypothesis incognitive science,” Behav. Brain Sci. 21, 615–628.

von Weizsacker, C. F. [1988] Aufbau der Physik (DTVMunchen).

von Weizsacker, C. F. [1974] Die Einheit der Natur(DTV Munchen).

Wilson, H. R. & Cowan, J. D. [1972] “Excitatory and in-hibitory interactions in localized populations of modelneurons,” Biophys. J. 12, 1–24.

Wright, J., Rennie, C., Lees, G., Robinson, P., Bourke,P., Chapman, C., Gordon, E. & Rowe, D. [2004]“Simulated electrocortical activity at microscopic,mesoscopic and global scales,” Int. J. Bifurcation andChaos 14, 853–872.