Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing
description
Transcript of Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing
![Page 1: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/1.jpg)
Generalized Hebbian Algorithm for Dimensionality Reduction in
Natural Language Processing
Genevieve Gorrell
5th June 2007
![Page 2: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/2.jpg)
IntroductionThink datapoints plotted in hyperspaceImagine a space in which each word has its own dimension
big bad[ 2 1 ][ 1 1 ][ 0 1 ]
We can compare these passagesusing vector representations in this space
axis of bigness
axis of badness
”big big bad”
”big bad”
”bad”
![Page 3: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/3.jpg)
Dimensionality Reduction
Do we really need two dimensions to describe the relationship between these datapoints?
axis of bigness
axis of badness
”big big bad”
”big bad”
”bad”
![Page 4: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/4.jpg)
Dimensionality Reduction
Do we really need two dimensions to describe the relationship between these datapoints?
axis of bigness
axis of badness
”big big bad”
”big bad”
”bad”
![Page 5: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/5.jpg)
Rotation
Imagine the data look like this ...
![Page 6: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/6.jpg)
Rotation
Imagine the data look like this ...
![Page 7: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/7.jpg)
More Rotation
Or even like this ...
![Page 8: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/8.jpg)
More Rotation
Or even like this ... Because if these
were the dimensions we would know which were the most important
We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation
![Page 9: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/9.jpg)
More Rotation
Or even like this ... Because if these
were the dimensions we would know which were the most important
We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation
![Page 10: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/10.jpg)
More Rotation
Or even like this ... Because if these
were the dimensions we would know which were the most important
We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation
![Page 11: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/11.jpg)
Eigen DecompositionThe key lies in rotating the data into the most efficient orientationEigen decomposition will give us a set of axes (eigenvectors) of a new space in which our data might more efficiently be represented
![Page 12: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/12.jpg)
Eigen Decomposition
Eigen decomposition is a vector space technique that provides a useful way to automatically reduce data dimensionalityThis technique is of interest in natural language processing
Latent Semantic IndexingGiven a dataset in a given space, eigen decomposition can be used to create a nearest approximation in a space with fewer dimensions
For example, document vectors as bags of words in a space with one dimension per word can be mapped to a space with fewer dimensions than one per word
Mv = λv
![Page 13: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/13.jpg)
A real world example—eigenfaces
Each new dimension captures something important about the dataThe original observation can be recreated from a combination of these components
![Page 14: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/14.jpg)
Eigen Faces 2Each eigen face captures as much information in the dataset as possible (eigenvectors are orthogonal to each other)
This is much more efficient than the original representation
![Page 15: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/15.jpg)
More Eigen Face Convergence
Eigen faces with high eigenvalues capture important generalisations in the corpusThese generalisations might well apply to unseen data ...
![Page 16: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/16.jpg)
We have been using this in natural language processing ...
Corpus-driven language modelling suffers from problems with data sparsity
We can use eigen decomposition to make generalisations that might apply to unseen data
But language corpora are very large ...
![Page 17: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/17.jpg)
Problems with eigen decomposition
Existing algorithms often;require all the data be available at once (batch processing)produce all the component vectors simultaneously, even though they may not all be necessary and it takes longer to do all of themare very computationally expensive, therefore may exceed the capabilities of the computer for larger corpora
large RAM requirementexponential relationship between time/RAM requirement and dataset size
![Page 18: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/18.jpg)
Generalized Hebbian Algorithm (Sanger 1989)
Based on Hebbian learningSimple localised technique for deriving eigen decompositionRequires very little memoryLearns based on single observations (for example, document vectors) presented serially, therefore no problem to add more data
In fact, the entire matrix need never be simultaneously available
Greatest are produced first
![Page 19: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/19.jpg)
GHA Algorithmc += (c . x) x
c is the eigenvector, x is the training datum
Initialise eigenvector randomlyWhile the eigenvector is not converged {
Dot-product each training vector with the eigenvectorMultiply the result by the training vectorAdd the resulting vector to the eigenvector
}
Dot-product is a measure of similarity of direction of one vector with another, and produces a scalarThere are various ways in which one might assess convergence
![Page 20: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/20.jpg)
GHA Algorithm Continued
Or in other words, train by adding each datum to the eigenvector proportionally with the extent to which it already resembles itTrain subsequent eigenvectors by removing the stronger eigenvectors from the data before we train, so it doesn’t find those ones
![Page 21: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/21.jpg)
GHA as a neural net
x=1
dp = Input_x Weight_x
n
Weight_1
Input_1
Input_2
Input_3
Input_n
Weight_2
Weight_3
Weight_n
Weight_2 += dp Input_2
Weight_1 += dp Input_1
Weight_n += dp Input_n
Weight_3 += dp Input_3
• Can be extended to learn many eigenvectors
![Page 22: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/22.jpg)
Singular Value Decomposition
Extends eigen decomposition to paired data
Word co-occurrence
big bad
big
bad 3
5
3
3
Word bigrams
big:2 bad:2
big:1
bad:1
“bad”“big bad”“big big bad”
0
1
0
2
![Page 23: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/23.jpg)
Asymmetrical GHA (Gorrell 2006)
Extends GHA to asymmetrical datasetsallows us to work with n-grams for example
Retains the features of GHA
![Page 24: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/24.jpg)
Asymmetrical GHA Algorithm
ca += (cb.xb) xa
cb += (ca.xa) xb
Train singular vectors on data presented as a series of vector pairs by dotting left training datum with left singular vector and scaling right singular vector by the resulting scalar and vice versa
for example, first word in a bigram might be vector xa and the second, xb
![Page 25: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/25.jpg)
Asymmetrical GHA Performance (20,000 NL bigrams)
RAM requirement linear with dimensionality and number of singular vectors required
Time per training step linear with dimensionality
This is a big improvement on conventional approaches for larger corpora/dimensionalities ...
But don't forget, the algorithm needs to be allowed to converge
![Page 26: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/26.jpg)
N-Gram Language Model Smoothing
Modelling language as a string of n-gramshighly successful approachbut we will always have problems with data sparsityzero probabilities are bad news
A Zipf Curve
![Page 27: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/27.jpg)
N-gram Language Modelling—An Example Corpus
A man hits the ball at the dog. The man hits the ball at thehouse. The man takes the dog to the ball. A man takes the ball to thehouse. The dog takes the ball to the house. The dog takes the ball tothe man. The man hits the ball to the dog. The man walks the dog tothe house. The man walks the dog. The dog walks to the man. A dog hitsa ball. The man walks in the house. The man hits the dog. A ball hitsthe dog. The man walks. A ball hits. Every ball hits. Every dog walks. Everyman walks. A man walks. A small man walks. Every nice dog barks.
![Page 28: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/28.jpg)
man hits the ball at dog house takes to walks a in small nice barksa 0.03 0.0 0.0 0.03 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0man 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.02 0.0 0.07 0.0 0.0 0.0 0.0 0.0hits 0.0 0.0 0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0the 0.1 0.0 0.0 0.07 0.0 0.1 0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0ball 0.0 0.03 0.0 0.0 0.02 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0at 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0takes 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0dog 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.02 0.02 0.02 0.0 0.0 0.0 0.0 0.01to 0.0 0.0 0.07 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0walks 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.01 0.0 0.0 0.0in 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0every 0.01 0.0 0.0 0.01 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0small 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0nice 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
An Example Corpus as Normalised Bigram Matrix
![Page 29: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/29.jpg)
man hits the ball at dog house takes to walks a in small nice barksa 0.02 0.00 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.10 0.00 0.00 0.07 0.00 0.10 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.01 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
First Singular Vector Pair
![Page 30: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/30.jpg)
man hits the ball at dog house takes to walks a in small nice barksa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Second Singular Vector Pair
![Page 31: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/31.jpg)
man hits the ball at dog house takes to walks a in small nice barksa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.02 0.02 0.06 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Third Singular Vector Pair
![Page 32: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/32.jpg)
Language Models from Eigen N-Grams
Add k singular vector pairs (“eigen n-grams”) together
Remove all the negative cell values
Normalise row-wise to get probabilities
Include a smoothing approach to remove zeros
man hits the ball at dog house takes to walks a in small nice barksa 0.02 0.00 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.02 0.02 0.06 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.10 0.00 0.00 0.07 0.00 0.10 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.01 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
![Page 33: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/33.jpg)
What do we hope to see?
Theory is that reduced dimensionality representation better describes the unseen test corpus than the original representationAs k increases perplexity should decrease until the optimum is reachedk should then begin to increase as the optimum is passed and too much data is includedWe hope for a U-shaped curve
![Page 34: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/34.jpg)
Some results ...
Perplexity is a measure of the quality of the language model
k is number of dimensions (eigen n-grams)
Times are how long it took to calculate the dimensions
![Page 35: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/35.jpg)
Some specifics about this experiment
The corpus comprises five newsgroups from CMU's newsgroup corpusTraining corpus contains over a million itemsUnseen test corpus comprises over 100,000 itemsI used AGHA to calculate the decompositionI used simple heuristically-chosen smoothing constants and single-order language models
![Page 36: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/36.jpg)
Maybe k is too low?
200,000 trigramsLAS2 algorithm
![Page 37: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/37.jpg)
Full rank decomposition
20,000 bigrams
Furthermore perplexity in each case never reaches the baseline of perplexity of the original n-gram model
![Page 38: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/38.jpg)
Linear interpolation may generate an interesting result
k Weight SVDLM perp. N-gram perp. Comb. perp25 1 7.071990e+02 4.647004e+02 3.891952e+0210 1 8.884950e+02 4.647004e+02 3.695157e+0210 0.7 8.884950e+02 4.647004e+02 3.705559e+025 1 1.156845e+03 4.647004e+02 3.788119e+02
Best result is 370 An overall improvement of 20% is demonstrated (However, this involved tuning on the test corpus)
![Page 39: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/39.jpg)
200,000 Trigram Corpus
k Weight SVDLM perp. N-gram perp. Comb. perp100 1 1.003399e+03 4.057236e+02 3.196404e+0250 1 1.220449e+03 4.057236e+02 3.008804e+0225 1 1.508873e+03 4.057236e+02 2.834632e+0210 1 2.188041e+03 4.057236e+02 2.898518e+02
Improvement on the baseline n-gram is even greater on the medium-sized corpus (30%)
![Page 40: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/40.jpg)
1 Million Trigram Corpusk Weight SVDLM perp. N-gram perp. Comb. perp25 1 4.237069e+04 3.730947e+02 3.729931e+0225 2 4.237069e+04 3.730947e+02 3.728907e+0225 10 4.237069e+04 3.730947e+02 3.721338e+0225 100 4.237069e+04 3.730947e+02 3.663666e+0225 1000 4.237069e+04 3.730947e+02 3.442525e+0225 10000 4.237069e+04 3.730947e+02 2.980755e+0225 100000 4.237069e+04 3.730947e+02 2.422045e+0225 1000000 4.237069e+04 3.730947e+02 2.187968e+0225 10000000 4.237069e+04 3.730947e+02 2.741027e+02
This is a big dataset for SVD! Needed to increase the weighting on the
SVDLM a lot to get a good result
![Page 41: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/41.jpg)
Fine-Tuning kk Weight SVDLM perp. N-gram perp. Comb. perp25 1000000 4.237069e+04 3.730947e+02 2.192305e+0220 1000000 4.249082e+04 3.730947e+02 2.174188e+0215 1000000 4.266386e+04 3.730947e+02 2.100715e+0210 1000000 4.290579e+04 3.730947e+02 2.102029e+02
Tuning k results in a best perplexity of over 40% A low optimal k is a good thing because many
algorithms for calculating SVD produce singular vectors one at a time starting with the largest
![Page 42: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/42.jpg)
TractabilityThe biggest challenge with SVDLM is tractabilityCalculating SVD is computationally demanding
But optimal k is lowI have also developed an algorithm that helps with tractability
Usability of the resulting SVDLM is also an issueSVDLM is much larger than regular n-gramBut the size can be minimised by discarding low values with minimal impact on performance
![Page 43: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/43.jpg)
Backoff SVDLM
Improving on n-gram language modelling is interesting workHowever no improvement on the state of the art has been demonstrated yet!Next steps involve creation of a backoff SVDLM
Interpolating with lower-order n-grams is standardBackoff models have much superior performance
![Page 44: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/44.jpg)
Similar Work
Jerome Bellegarda developed the LSA language model
Uses longer span eigen decomposition information to access semantic informationOthers have since developed the work
Saul and Pereira demonstrated an approach based on Markov models
Again demonstrates that some form of dimensionality reduction is beneficial
![Page 45: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/45.jpg)
Summary
GHA-based algorithm allows large datasets to be decomposedAsymmetrical formulation allows data such as n-grams to be decomposedPromising initial results in n-gram language model smoothing have been presented
![Page 46: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681679a550346895ddcd868/html5/thumbnails/46.jpg)
Thanks!
Gorrell, 2006 “Generalized Hebbian Algorithm for Incremental Singular Value Decomposition.” Proceedings of EACL 2006Gorrell and Webb, 2005 ”Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis.” Proceedings of Interspeech 2005Sanger, T. 1989 ”Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Network.” Neural Networks, 2, 459-473