Minimalism and Merge Grammarsmatilde/LinguisticsToronto14.pdf · Minimalism and Merge Grammars...

39
Minimalism and Merge Grammars Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 MAT1509HS Win2019: Linguistics Merge Grammars

Transcript of Minimalism and Merge Grammarsmatilde/LinguisticsToronto14.pdf · Minimalism and Merge Grammars...

Minimalism and Merge Grammars

Matilde MarcolliMAT1509HS: Mathematical and Computational Linguistics

University of Toronto, Winter 2019, T 4-6 and W 4, BA6180

MAT1509HS Win2019: Linguistics Merge Grammars

Main References:

E.P. Stabler, Computational perspectives on minimalism, in“Oxford Handbook of Linguistic Minimalism”, OxfordUniversity Press, 2010, 616–641.

K. Vijay-Shanker, D. Weir, The equivalence of four extensionsof context free grammar formalisms, Mathematical SystemsTheory, 27 (1994) 511–545.

P. beim Graben, S. Gerth, Geometric representations forminimalist grammars, arXiv:1101.5076

T. Hunter, C. Dyer, Distributions on Minimalist GrammarDerivations, Proc. 13th Meeting of the Mathematics ofLanguage (MoL 13), Association for ComputationalLinguistics, 2013, pp.1–11.

R.C. Berwick, M. Marcolli, Linguistic merge and theformalism of renormalization, work in preparation.

MAT1509HS Win2019: Linguistics Merge Grammars

Extend the Context-Free Class to Mild Context Sensitivity

limited cross-serial dependencies

polynomial time parsing

semilinearity

MAT1509HS Win2019: Linguistics Merge Grammars

Semilinearity

• a subset V ⊂ Zk+ is semilinear if it is a finite union of sets of the

form{c +

∑w∈P

λw w | c ∈ C}

for some finite sets C,P ⊂ Nk and scalars λw

• a language L ⊂ A∗ with alphabet #A = k is semilinear iff forany monoid homomorphism

ϕ : (A∗, ?)→ (Zk+,+)

the image ϕ(L) ⊂ Nk is a semilinear set

• context-free and tree-adjoining grammars have semilinearproperty (Joshi and Yokomori, 1983)

MAT1509HS Win2019: Linguistics Merge Grammars

Multiple Context Free Grammars (MCFG)

• introduced by H.Seki, T.Matsumura, M.Fujii, T.Kasami, 1990

• Example: L = {an1an2 · · · an2m | n ≥ 0} is an m-MCFG

G = (N = {A, S},T = {ai}2mi=1,O = ∪mk=1(T ∗)k , {f , g},P,S)

with production rules P

f (x1, x2, . . . , xm) = (a1x1a2, a3x2a4, . . . , a2m−1xma2m)

g(x1, x2, . . . , xm) = x1x2 · · · xmA→ (ε, ε, . . . , ε), A→ f [A], S → g [A]

MAT1509HS Win2019: Linguistics Merge Grammars

MCFG: general definition G = (N,T ,O,F ,P, S)

O = ∪mk=1(T ∗)k

finite set F of (partial) functions f : Oa(f ) → O somea(f ) ∈ Nf ∈ F function of a(f ) variables: there are0 ≤ r(f ), dk(f ) ≤ m, k = 1, . . . ,m,

f :m∏

k=1

(T ∗)dk (f ) → (T ∗)r(f )

functions f (x1, . . . , xa(f )) are concatenations of constantstrings in T ∗ and variables inX = {xkj , k = 1, . . . , a(f ), j = 1, . . . , dk(f )} with each xijoccurring at most once

d : N → N, d(S) = 1, if A→ f (A1, . . . ,Aa(f )) in P thenr(f ) = d(A) and dk(f ) = d(Ak)

MAT1509HS Win2019: Linguistics Merge Grammars

Properties:

m-MCFG ( (m + 1)-MCFG

MCFGs are semilinear (Vijay-Shanker, Weir, Joshi, 1987)

tree adjoining grammars sit between CFG and 2-MCFG

CFG = 1-MCFG ( TAG ( 2-MCFG

recognition w ∈ LG is polynomially decidable(but inclusion LG1 ⊆ LG2 is undecidable)

MCFGs can be made stochastic as CFGs

MAT1509HS Win2019: Linguistics Merge Grammars

Merge Grammars or Minimalist Grammars (MG)

• introduced in

Edward P. Stabler and Edward L. Keenan, Structural similaritywithin and among languages, Theoretical Computer Science,293 (2003) 345–363.

• formalizing the derivations within Chomsky’s Minimalist Modelin the setting of formal languages

MAT1509HS Win2019: Linguistics Merge Grammars

• Minimalist Grammar G = (A,Sel , Lic , Lex , c)

A finite alphabet

Lic (licensing types) and Sel (selecting types) disjoint finitesets

Syn set of syntactic features:

lexicon finite subsetLex ⊂ A∗× (selectors ∪ licensors)∗× selectees × licensees ∗

c ∈ Sel type for completed expression

MAT1509HS Win2019: Linguistics Merge Grammars

Examples of minimalist lexicon items in Lex

lexical categories: adjective A, adjective phrase AP, adverb Adv,adverb phrase AdvP, noun N, noun phrase NP, verb V, verb phraseVP, etc.

functional categories: coordinate conjunction C, determiner D,negation Neg, particle Par, preposition P, prepositional phrase PP,subordinate conjunction Sub, tense T, tense phrase TP, etc.

selection: =X selection of an X phrase

licensees: -X requirements forcing movement

licensors: features that satisfy licensees requirements like +wh+case etc.

MAT1509HS Win2019: Linguistics Merge Grammars

Operations instead of production rules only two fixed kinds ofoperations in Minimalist Grammars

1 MERGE2 MOVE

MAT1509HS Win2019: Linguistics Merge Grammars

Merge and Move• more formal description of MERGE and MOVE

• Merge: (α, β) 7→ {α, {α, β}} or {β, {α, β}}• iterations: (γ, {α, {α, β}}) 7→ {γ, {γ, {α, {α, β}}}}

MAT1509HS Win2019: Linguistics Merge Grammars

Example of derivation in Minimalist Grammars (embedded question)

MAT1509HS Win2019: Linguistics Merge Grammars

MG = MCFG (Theorem 1 of Stabler 2010)

Main Idea of how to transform a MG grammar into a MCFG

transform trees into tuples of strings (subscript 0: non-lexicalexpressions; subscript 1 lexical; also :: and : for lexical and derived)• these tuples of strings give the production rules of a MCFG• start symbol of MCFG is 〈c〉0

MAT1509HS Win2019: Linguistics Merge Grammars

Example: previous derivation in terms of tuples of strings

MAT1509HS Win2019: Linguistics Merge Grammars

External and Internal Merge Operations

• MG operations of MERGE and MOVE unified as two aspects ofthe same merge operation

1 external merge

2 internal merge

under shortest move constraint (SMC): exactly one head inthe tree has −x as first feature; tM maximal projection

MAT1509HS Win2019: Linguistics Merge Grammars

Head and Projection

labels > and < of merge identify where head of the tree is:here leaf vertex number 8

maximal projection in T is a subtree of T that is not a propersubtree of any larger subtree with the same head

leaves {2, 3, 4} determine a subtree with head vertex the leafnumbered 3

any larger subtree in T would have a different head: this is amaximal projection

also subtree determined by leaves {5, 6}MAT1509HS Win2019: Linguistics Merge Grammars

Notation about features

in addition to labels {>,<} of merge operations, finite set ofsyntactic features labels X ∈ {N,V ,A,P,C ,T ,D, . . .}selector features denoted by the symbol σX for a headselecting a phrase XP(Note: usually notation = X rather than σX for selector)

T [α] tree where head is labelled by an ordered set of syntacticfeatures starting with α

operation T [α] 7→ T removes the α-feature from the headvertex

MAT1509HS Win2019: Linguistics Merge Grammars

Examples• external merge

• internal merge

MAT1509HS Win2019: Linguistics Merge Grammars

Merge and the Origin of Language

Robert C. Berrwick, Noam Chomsky, Why only us? MITPress, 2015.

• proposal of a single significant evolutionary change leading to thestructure of human languages in a single computational operationcapable of generating recursive structures: merge operation

• is there a way to characterize merge as a fundamental structureof recursion in a mathematical sense?

MAT1509HS Win2019: Linguistics Merge Grammars

Merge as a Universal Structure: Recursion and Renormalization

• planar rooted trees

rooted tree T : simply connected (no loops) finite graph,vertex set V , distinguished element vr ∈ V (root), set ofedges E oriented away from root, source and target mapss, t : E → V , leaves univalent vertices

a planar embedding determines a linear ordering of the leaves

vertex-decorated LV : V → DV finite set, edge-decoratedLE : E → DE finite set of decorations; assumevertex-decorated

binary: leaves univalent; root valence two; all internal verticesvalence three (all binary splittings from root to leaves)

MAT1509HS Win2019: Linguistics Merge Grammars

Merge trees• set of labels will correspond to syntactic features, as well assymbols < and > for merge operations

<

α β

for merge α

α β

• planar or non-planar trees: planar trees assumes linear ordering isdetermined for result of merge (Stabler); non-planar if version oflinguistic minimalism where when merge is performed linear orderis determined later in the derivation

MAT1509HS Win2019: Linguistics Merge Grammars

Loday–Ronco Hopf algebra of planar rooted trees

vector space Vk spanned by planar binary rooted trees T withk internal vertices (hence k + 1 leaves)

dimVk = (#DV )k(2k)!

k!(k + 1)!

#DV cardinality of set DV of vertex labels

graded vector space V = ⊕k≥0Vk with V0 = Qgiven label d ∈ DV , grafting operator ∧d

∧d : V ⊗ V → V, T1 ⊗ T2 7→ T = T1 ∧d T2

with ∧d : Vk ⊗ V` → Vk+`−1attaching the two roots vr1 of T1 and vr2 of T2 to a singleroot vertex v labelled by d ∈ DV

MAT1509HS Win2019: Linguistics Merge Grammars

associative concatenation operations on planar binary rooted trees

given S and T , the tree S\T (S under T ) obtained bygrafting the root of T to the rightmost leaf of S

tree T/S (S over T ) obtained by grafting the root of T tothe leftmost leaf of S

grafting operation and concatenations

grafting operation obtained from concatenations

T1 ∧d T2 = T1/S\T2,

with S tree with a single vertex decorated by d ∈ DV

each planar binary rooted tree is a grafting T = T` ∧d Tr ofthe trees stemming to the left and right of root vertex

MAT1509HS Win2019: Linguistics Merge Grammars

Loday–Ronco Hopf algebra HLR

vector space V = ⊕k≥0Vk with V0 = Qmultiplication and a comultiplication inductively by degrees

trees T = T` ∧ Tr and T ′ = T ′` ∧ T ′r with product

T ? T ′ = T` ∧ (Tr ? T′) + (T ? T ′`) ∧ T ′r

coproduct

∆(T ) =∑j ,k

(T`,j ? Tr ,k)⊗ (T ′`,n−j ∧ T ′r ,m−k) + T ⊗ •

with T = T` ∧ Tr and ∆(T`) =∑

j T`,j ⊗ T ′`,n−j and∆(Tr ) =

∑k Tr ,k ⊗ T ′r ,m−k for T` ∈ Vn and Tr ∈ Vm

antipode on graded bialgebras inductively

S(X ) = −X −∑

S(X ′)X ′′

for ∆(X ) = X ⊗ 1 + 1⊗ X +∑

X ′ ⊗ X ′′ lower deg X ′,X ′′

MAT1509HS Win2019: Linguistics Merge Grammars

External Merge in HLR

• external merge T = em(T1[σX ],T2[X ]) of treesT1[σX ] and T2[X ]

if T1[σX ] single root vertex labelled by feature σX

<

T1 T2

in all other cases

>

T2 T1

MAT1509HS Win2019: Linguistics Merge Grammars

• labelled grafting (with domain)

X = X0X1 · · ·Xr or X = σX0X1 · · ·Xr string of syntacticfeatures and σ selector

domain

Dom(∧) = {(T1[X ],T2[Y ]) |X = σX0X1 · · ·Xr and Y = X0Y1 · · ·Ys}

external merge as labelled grafting

T1[X ]∧T2[Y ] =

{T1[X1 · · ·Xr ] ∧< T2[Y1 · · ·Ys ] |T1| = 1T2[Y1 · · ·Ys ] ∧> T1[X1 · · ·Xr ] |T1| > 1.

notation X for X with the first feature erased

T1[X ] ∧ T2[Y ] =

{T1[X ] ∧< T2[Y ] |T1| = 1

T2[Y ] ∧> T1[X ] |T1| > 1.

MAT1509HS Win2019: Linguistics Merge Grammars

Internal Merge in HLR

given binary rooted tree T with subtree T1 and another binaryrooted tree T2 define T{T1 → T2} as planar binary rootedtree obtained by removing T1 from T and replacing it with T2

notation: σ for the feature selector (=), and ω and ω for“licensor” and “licensee” (±)

domain of internal merge

Dom(I) = {T [X ] | ∃T1[Y ] ⊂ T [X ],with Y = ωX0Y , X = ωX0X}

internal merge

I(T [X ]) = TM1 [Y ] ∧> T{T1[Y ]M → ∅}

MAT1509HS Win2019: Linguistics Merge Grammars

Admissible Cuts on Trees

(planar) rooted tree T

admissible cut C of T : selection of a number of edges of Tsuch that every oriented path from the root to one of theleaves contains at most one of the selected edges

removing edges in C gives a disjoint union ρC (T ) t πC (T )

a (planar) tree ρC (T ) containing the root vertexa disjoint union πC (T ) = ∪iTi of (planar) trees each with aunique source vertex (root)

elementary admissible cut: cut consisting of a single edge

MAT1509HS Win2019: Linguistics Merge Grammars

Internal Merge via Admissible Cuts

tree T{T1 → ∅} is same as tree ρC (T ) of a single elementarycut

internal merge

I(T [X ]) = TM1 [Y ] ∧> T{T1[Y ]M → ∅} = πC (T ) ∧> ρC (T )

C is the elementary admissible cut specified by subtree TM1

internal merge are modelled by the three combinatorialoperations

T 7→ • ∧ T , T1 ⊗ T2 7→ T2 ∧ T1,

T 7→ ρC (T )⊗ πC (T ) 7→ πC (T ) ∧ ρC (T )

MAT1509HS Win2019: Linguistics Merge Grammars

Iterations of Internal Merge

matching labels conditions for domains

N-th iterate internal merge: admissible cut C of the tree Twith the number of cut branches #C = N

I#C (T [X ]) =

1+#C∧ (πC (T )[Y] ρC (T )[XN ]

)notation πC (T )[Y] for the forest πC (T ) = TM

N · · ·TM1 , where

the label [Y] means

πC (T )[Y] = TMN [Y (N)] · · ·TM

1 [Y (1)]

label [XN ] of the tree ρC (T ) what remains of the originallabel X after the initial terms ωX0ωX1 · · ·ωXN−1 are removed

MAT1509HS Win2019: Linguistics Merge Grammars

Connes–Kreimer Hopf algebra of rooted trees

polynomial algebra generated by the planar rooted trees T

coproduct: sum over all admissible cuts

∆(T ) = T ⊗ 1 + 1⊗ T +∑C

πC (T )⊗ ρC (T )

grading by span of the planar rooted trees with k internalvertices

antipode defined inductively on graded bialgebras

used as reformulation of the Connes–Kreimer Hopf algebra ofFeynman graphs in perturbative QFT

MAT1509HS Win2019: Linguistics Merge Grammars

Comparison between Hopf algebras

there is a map of Hopf algebras φ : HCK → HLR

maps unit 1 ∈ HCK (empty tree) to binary tree consisting ofsingle root vertex •maps single vertex tree • in HCK to binary tree with a singleinternal vertex (one root and two leaves)otherwise maps

φ(T ) = φ(F (T ))/φ(•)

with F (T ) forest obtained by removing root of T and / is theconcatenation operation grafting root of φ(F (T )) to left leafof φ(•)for a forest F = T1 · · ·Tn in HCK image

φ(F ) = φ(T1)\φ(T2)\ · · · \φ(Tn)

with \ the other concatenation operation grafting root ofφ(Ti+1) to rightmost leaft of φ(Ti )

this map is compatible with product and coproduct andantipode

MAT1509HS Win2019: Linguistics Merge Grammars

Example

References

M. Aguiar, F. Sottile, Structure of the Loday–Ronco Hopf algebraof trees, Journal of Algebra, Vol.295 (2006) 473–511

A. Connes, D. Kreimer, Hopf algebras, Renormalization andNoncommutative geometry, Comm. Math. Phys 199 (1998)203–242

J.L. Loday, M. Ronco, Hopf algebra of the planar binary trees, Adv.Math. 139 (1998) N.2, 293–309

MAT1509HS Win2019: Linguistics Merge Grammars

Recursive Structures and Dyson–Schwinger Equations in QFT

in the Connes–Kreimer setting

in perturbative QFT solve the equations of motion by arecursive combinatorial equation in the Feynman graphs:Dyson–Schwinger equationoperators B+

d : H → H with d ∈ DV vertex decoration

B+d (T1 · · · · · Tm) = T

with T grafting roots vr1 , . . . , vrm of trees T1, . . . ,Tm tocommon vertex •d labelled by d ∈ DV

satisfies Hochschild 1-cocycle condition

∆(B+d (X )) = B+

d (X )⊗ 1 + (Id ⊗ B+d ) ◦∆(X )

Dyson–Schwinger Equation

X = B+(P(X ))

fixed point of nonlinear transformation X 7→ B+(P(X ))some polynomial or formal power series P(t) =

∑k≥0 akt

k

with a0 = 1MAT1509HS Win2019: Linguistics Merge Grammars

unique solution X =∑

k≥1 xk

xn+1 =n∑

k=1

∑j1+···+jk=n

akB+(xj1 · · · xjk )

initial step x1 = B+(1)

more general form with vertex labels: variables X = (Xδ)δ∈DV

Fδ(X ) =∑

k1,...,kN

a(δ)k1,...,kN

X k1δ1· · ·X kN

δN

Dyson–Schwinger Equation

Xδ = B+δ (Fδ(X ))

unique solution Xδ =∑

τ xτ τ

xτ = (N∏

k=1

(∑mk

l=1 pδ,l)!∏mkl=1 pδ,l !

)a(δ)∑N

k=1 p1,k ,...,∑N

k=1 pN,kxp1,1τ1,1 · · · x

pN,mNτN,mN

τ = B+(τp1,11,1 · · · τ

p1,m11,m1

· · · τpN,1

N,1 · · · τpN,mNN,mN

)

MAT1509HS Win2019: Linguistics Merge Grammars

Dyson–Schwinger Equation in Loday–Ronco setting

Dyson–Schwinger equation X = B+(P(X )) becomesφ(X ) = φ(B+(P(X )))

for a forest F = T1 · · ·Tk image φ(B+(F )) = φ(F )/S withS = φ(•)general form of the solution

φ(xn+1) =n∑

k=1

∑j1+···+jk=n

ak φ(xj1 · · · xjk )/S =

n∑k=1

∑j1+···+jk=n

ak (φ(xj1)\ · · · \φ(xjk ))/S

MAT1509HS Win2019: Linguistics Merge Grammars

References about Dyson–Schwinger equations and Hopf algebras:

C. Bergbauer, D. Kreimer, Hopf algebras in renormalization theory:locality and Dyson-Schwinger equations from Hochschildcohomology, in “Physics and Number Theory”, pp. 133–164, IRMALect. Math. Theor. Phys. 10, Eur. Math. Soc., 2006

C. Delaney, M. Marcolli, Dyson-Schwinger equations in the theoryof computation, in “Feynman amplitudes, periods and motives”,pp. 79–107, Contemp. Math. 648, Amer. Math. Soc., 2015.

L. Foissy, Classification of systems of Dyson–Schwinger equations inthe Hopf algebra of decorated rooted trees, Advances in Math. 224(2010) 2094–2150

K. Yeats, Rearranging Dyson-Schwinger Equations, Memoirs of theAmerican Mathematical Society, 211, American MathematicalSociety, 2011.

MAT1509HS Win2019: Linguistics Merge Grammars

Dyson–Schwinger Equations and Generative Processes

generative process that identifies a family of planar binaryrooted trees obtained recursively through the application ofthe operations

φ(xj1), . . . , φ(xjk ) 7→ (φ(xj1)\ · · · \φ(xjk ))/φ(•)

linear combination and the coefficients ak as additional datathat keep track of weights assigned to the trees, so that eachφ(xj`) is itself a weighted combination of binary trees

general problem of how to make MG grammars probabilistic(via MCFGs not suitable)

with this iteration equation method can assign coefficients ak :use to assign probabilities consistently

Question: can view derivations in minimalist linguistics as recursivesolutions of Dyson–Schwinger type equations? Is this what isuniversal about merge?

MAT1509HS Win2019: Linguistics Merge Grammars