Grammar Induction So what did we have?. Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3)...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Grammar Induction So what did we have?. Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3)...
Is that a dog?
(6)102(5)(4)102 (3)
(4)
101
101)1( (2) 101 (3)
103
(1)
104
(1)
(2)
104
(3)
(2)(3)
103
(6)
(5)
(7)
(6)
)6(
(5)
where
104
(4)the
dog ? END
(4)
(5)
a
andhorse
)2( that
cat
102(1)BEGIN is
Is that a cat?Where is the dog? And is that a horse?
nodeedge
The Model: Graph representation with words as vertices and sentences as paths.
Detecting significant patterns
Identifying patterns becomes easier on a graph Sub-paths are automatically aligned
search path
4 5
1
2
36 7
e1 end
5 4
7
1
23
vertex
path
begin
8
e4 e5 e6
86
A
e3e2
9Initialization
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
e1 e2 e3 e4 e5
significantpattern
PR)e2|e1(=3/4
PR)e4|e1e2e3(=1
PR)e5|e1e2e3e4(=1/3
PL)e4(=6/41
PL)e3|e4(=5/6
PL)e2|e3e4(=1
PL)e1|e2e3e4(=3/5
PL
SLSR
PR
PR)e1(=4/41
PR)e3|e1e2(=1
begin end
Motif EXtraction
Pattern significance Say we found a potential pattern-edge
from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because it’s a pattern edge, we know that
Let’s suppose that the true probability for n+1 given 1 through n is
r/m is our best estimate, but just an estimate What are the odds of getting r and m but
still have ?
1 or nn
n
P rP
P m
*1nP
*1n nP P
Pattern significance Assume The odds of getting result r and m
or better are then given by
If this is smaller than a predetermined α, we say the pattern-edge candidate is significant
*1n nP P
1
( , , ) ( ) (1 )r
i m in n n
i
mB r m P P P
i
search path1
2
36 7
e1 end
5 4
7
1
2
3
vertex
begin
8
e4 e5 e6
54
76
5 4
6 73
e2
new vertex
86
11
e3e2
e3
9
3
9
e4
8
8
C rewiring
e2 e3 e4
P1
4 5
path9
9
Rewiring the graph
Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.
Evaluating performance Define
Recall – the probability of ADIOS recognizing an unseen grammatical sentence
Precision – the proportion of grammatical ADIOS productions
Recall can be assessed by leaving out some of the training corpus
Precision is trickier Unless we’re learning a known CFG
Determining L
Involves a tradeoff Larger L will demand more context
sensitivity in the inference Will hamper generalization
Smaller L will detect more patterns But many might be spurious
The effects of context window width
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
C
D
B
G
Recall
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Pre
cis
ion
over-generalization
low
pro
du
cti
vit
y
A
B
C
D L=6
L=5
L=4
L=3
10,000Sentences
120,000Sentences
40,000Sentences
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
F
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
E
120,000Sentences
An ADIOS drawback
ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains
forever – errors conflate Sentence ordering affects outcome
Running ADIOS with different orderings gives patterns that ‘cover’ different parts of the grammar
An ad-hoc solution Train multiple learners on the corpus
Each on a different sentence ordering Create a ‘forest’ of learners
To create a new sentence Pick one learner at random Use it to produce sentence
To check grammaticality of given sentence If any learner accepts sentence, declare as
grammatical
The ATIS experiments ATIS-NL is a 13,043 sentence corpus of
natural language Transcribed phone calls to an airline
reservation service ADIOS was trained on 12,700 sentences
of ATIS-NL The remaining 343 sentences were used to
assess recall Precision was determined with the help of 8
graduate students from Cornell University
The ATIS experiments
ADIOS’ performance scores (40 learners) – Recall – 40% Precision – 70%
For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)
ADIOS/ATIS-N comparison
0.00
0.20
0.40
0.60
0.80
1.00
ADIOS ATIS-N
Pre
cis
ion
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
Meta-analysis of ADIOS results
Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its
contents E.g. TT, TET, EE, PE…
A single ADIOS learner was trained with each of 6 translations of the bible
Pattern spectraT
T
TE
TP
ET
EE
EP
PT
PE
PP
TTT
TTE
TTP
TE
T
TE
E
TE
P
TP
T
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
English
Spanish
Swedish
Chinese
Danish
French
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 200 400 600 8000.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 200 400 600 800
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
Language dendogram
TT TE TP ET EE EP PT PE PPTT
TTT
ETT
PTE
TTE
ETE
PTP
T
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
Our experience ADIOS does nicely on
ATIS-N Childes Artificial CFGs
Fails miserably on almost anything else The Wall-Street Journal Children’s literature The Bible
Results CHILDES
Very high recall + precision The ESL test
ATIS-N Up to 70% recall (with 700 learners) Superior language model
Children’s lit Very few patterns are detected
Some example sentences Childes
baby go ing to go up the ladder ? the dog won 't sit in the chaise lounge . take the lady for a ride
Atis-n i would like one coach reservation for may
ninth from pittsburgh to atlanta leaving pittsburgh before ten o'clock in the morning
where is the stopover of american airlines flight five four five nine
what are the flights from boston to washington on october fifteenth nineteen ninety one
Some example sentences Children’s lit
The Tin Woodman and the Scarecrow didn ' t mind the dark at all , but Woot the Wanderer felt worried to be left in this strange place in this strange manner , without being able to see any danger that might threaten .
I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .
Some corpus statistics
Corpus Word types #sentences
Avg. sentence
length
CHILDES 14,401 320,000 ~6 words
ATIS-N 1,153 12,700 ~10 words
Children’s lit
52,180 41,129 ~52 words
Possible causes for failure I
Sentence complexity and structural diversity CHILDES and ATIS-N have very few
sentence ‘types’ Most of which are simple, single-clause
sentences Children’s lit has many complex
sentences with multiple clauses
Types of complex sentences Complementary clauses
Peter promised that he would come Sue wants Peter to leave
Relative clauses Sally bought the bike that was on sale Is that the driver causing the accidents?
Adverbial clauses He arrived when Mary was just about to
leave She left the door open to hear the baby
Coordinate clauses He tried hard, but he failed
That example again
I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .
Possible causes for failure Sentence complexity and
structural diversity CHILDES and ATIS-N have very few
sentence ‘types’ Most of which are simple, single-clause
sentences Children’s lit has many complex
sentences with multiple clauses The music lesson
Possible remedies
How do children do it? Incremental learning On the importance of starting small
How might we mimic that? Sorting sentences according to
complexity Starting out with a simpler corpus
The problem of the growing lexicon
Generalizing patterns
New sentence: I like the cow
P1: I like the _E1
_E1 =dogcat
horse
P1: I like the _E1
_E1 =
dogcat
horsecow
May cause overgeneralization
P1: I like the _E1
_E1 =dogcat
horse
New sentence: I like the finer things in life
P1: I like the _E1
_E1 =
dogcat
horsefiner
Allowing gaps
New sentence: I like the red dog
P1: I like the _E1
_E1 =dogcat
horse
P2: I like the red _E1
_E1 =dogcat
horse
Another approach Two-phase learning
Split complex sentences into simple clauses Learn simple clauses Combine results back to complex sentences
and resume learning Sidesteps the problem of the growing
lexicon Introduces the problem of identifying
clause boundaries
That example again
I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .
Possible causes for failure II
Sentence complexity and structural diversity
Lexicon size vs. #sentences Large lexicon might curtail
alignments necessary for generalization
Possible remedies How do children do it?
Have access to semantic information Which may be used for alignment
How can we mimic it? Introducing pre-existing ECs WordNet Distributional Clustering
Semantic tagging?
An aside - bootstrapping Used for very small corpora Iteratively do –
Train a set of learners on the current corpus Generate sentences Replace corpus with generated sentences
Problematic for large corpora Must be performed by transforming the
existing sentences
A little on Java classes Similar to struct in C Also allow the definition of class-
specific functions Data members may be
Private – only accessible to class functions
Public – accessible to everyone Protected – like private, for most of
our purposes
The code Consists of three packages
Com.ADIOS.Model – contains classes defining the graph (graph.java, node.java, edge.java, etc’)
Com.ADIOS.Algorithm – the ‘brains’ of the implementation (most importantly contains MarkovMatrix.java and Trainer.java)
Com.ADIOS.Helpers – various helper classes
The algorithm
Trainer MarkovMatrix
also finds new equivalence class Generator
calculates recall and generates new sentences
The main package
Main Processes command line arguments
(context window width, corpus file name, etc’)
Finals A repository of constants used
throughout the code
The Model – Node.java
Data members Label, inEdges, outEdges
Nontrivial functions getOutEdges(Vector inEdges)
Returns the edges going out if this node that come from inEdges
getInEdges(Vector outEdges) Same, only in other direction
The model – EquivalenceClass.java
Inherits from Node Additional data members –
Nodes Nontrivial functions –
getOutEdges(), getOutEdges(Vector inEdges)
Same as in Node, only sums for all constituent nodes
The model – Pattern.java
Inherits from Node Additional data members
Id, path (the pattern specification)
The model – Path.java Data members –
Id, nodes Nontrivial functions –
Init(StringTokenizer st) – inits the path according to a line of text
Squeeze(Pattern p, int, int) – finds the instances of p in the path and replaces them by the single node p
Does not rewire the graph!
The model – Edge.java
Data members – fromNode, toNode prevEdge, nextEdge, path
No nontrivial functions
The model – Graph.java
Main data members – nodes, edges, paths,
equivalenceClasses, patterns Nontrivial functions –
addPattern(Pattern p) – rewires the graph
Print functions – print various data to files
The algorithm – MarkovMatrix.java Main data members –
path, matrix, pathsCountMatrix winSize, winIndex, wildcardIndex ec
Nontrivial functions – findWildcardCandidate() – generates the
new equivalence class in the wildcard position
initMarkovMatrix() – calculates the matrix
The algorithm – Trainer.java Main data members –
leftCandidates, rightCandidates patterns
Nontrivial functions – trainSinglePath – runs MEX on a single
(maybe generalized) path createTrialPathMatrix – finds the subpaths
the go through the context window alignment – generalizes the search path and
searches for patterns getPatterns – intersect candidates
The algorithm – Generator.java
Main functions – getRecall – gets a file name and
returns how many sentences were accepted by the current grammar
generatePathsFile – creates new sentences using current grammar and stores them in file
Helpers – Parser.java
Main function – loadCorpusFile – loads corpus to
graph, initializing everything
com.ADIOS
Finals.java – contains important constants used throughout the code
Main.java Processes command line arguments Calls the appropriate functions of the
appropriate objectss