Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

56
Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Page 1: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Latent Semantic Analysis:

A Model of Inductive Knowledge Acquisition

Paul Fillmore & Stefanie Wong

Page 2: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Overview The question of

interest The Problem The Proposed Solution:

LSA Latent Semantic

Analysis What is it? What can it do? How does it do it?

Evaluation of the model

Additional Considerations

Demonstrations of LSA

Page 3: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

The Problem of Induction

Plato’s problem: the poverty of the stimulus How do people acquire as much knowledge as

they do based on the little information they get?

Example: Language Acquisition Chomsky (1991) – Observing adult language is

insufficient for children’s development of grammar or a typical lexicon

Pinker (1994) – Language learning must be innate – a “language instinct”

Page 4: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Problem of induction in cognitive terms...

Problem of categorization What is the mechanism in which concepts

(cheetah, tigers) come to be treated as the same for some purpose (predators that will eat me)

Problem of similarity How does experience combine disparate things

into a feature identity (“wing” different for a bird, insect, bat)

Page 5: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Latent Semantic Analysis: What is it?

“Latent Semantic Analysis (LSA) is a mathematical/statistical technique for extracting and representing the similarity of meaning of words and passages by analysis of large bodies of text.”

More simply, it is a computer model of human associative learning through experience

Does not embody human knowledge beyond its general learning mechanism

Page 6: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

What can LSA do? Performance on standard vocabulary and subject matter

tests comparable to humans Demonstrates similar mechanism for word sorting and

category judgments Processes word-word and passage-word lexical priming

data It can accurately estimate:

Passage coherence Learnability of passages by individual students The quality and quantity of knowledge contained in essays

Can perform humanlike generalizations based on learning that isn’t dependent upon primitive perceptual relations/representations

Page 7: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

How does LSA work? Definitions

Semantic space Singular value decomposition (SVD) Dimensionality

Procedure 1) Matrix Input 2) Cell Transformation 3) Singular Value Decomposition 4) Dimension Reduction

Page 8: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Semantic Space A semantic space is a mathematical representation

of a large body of text (e.g. Encyclopedias, Psychology Texts)

Each term or combination of terms has its own high-dimensional vector representation within the semantic space

Similarity between vectors for words and context is measured by cosine of their combined angle Note: Terms can only be compared within a semantic space, not

directly between semantic spaces If vectors were projected onto a sphere surrounding the semantic

space, points close together would have closer semantic relations

Page 9: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Example of similarities within Semantic Space

Submitting a term/short text and receiving list of terms that are nearest to it in semantic space

Matrix comparison of multiple terms

Page 10: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Singular Value Decomposition A mathematical matrix decomposition technique

(general case of factor analysis), condenses large matrix of word-by-content data into smaller matrix

Smaller matrix typically has a 100-500 dimensional representation

The right number of dimensions critical for optimal simulation

Page 11: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Dimensionality Knowing appropriate

dimensionality improves estimates

Example: Three separate house,

ABC are arranged as follows: A is 5 units from both B and C, and B and C are separated by 8 units

Oh, also, all on the same straight, flat road

A

B C

AB C

Page 12: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Procedure: Matrix Input Rows = individual

word types Columns =

meaning-bearing passages (i.e. sentences or paragraphs)

Cells = frequency with which a word occurs in a passage

Page 13: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Procedure: Cell Transformation Transformation 1: Approximates standard

empirical growth functions of simple learning Taking a word’s appearance frequency

Transformation 2: makes primary association better represent the informative relation between the entities rather than co-occurrence Entropy for a word

log( )

log p

f

p log( )f

log( )p p

( )f

f Transformation 1 Transformation 2

Page 14: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Procedure: SVD & Dimension Reduction

SVD: [ij] = [ik] [kk] [jk]'

in which [ik] and [jk] have orthonormal columns, [kk] is a diagonal matrix of singular values, and k <= max (i,j).

Dimension reduction: all but the d largest singular values are set to zero, where d = number of dimensions to be used

Page 15: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Word (w) x Context (c) Matrix (X)

m columns of W and m rows of C’ are linearly independent

Diagonal Matrix

Orthonormal Matrices

Page 16: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

LSA Examplec1: Human machine interface for ABC

computer applicationsc2: A survey of user opinion of computer

system response timec3:The EPS user interface management

systemc4: System and human system

engineering testing of EPSc5: Relation of user perceived response

time to error measurementm1: The generation of random, binary,

ordered treesm2: The intersection graph pf paths in

treesm3: Graph minors IV: Widths of trees and

well-quasi orderingm4: Graph minors: A survey

Page 17: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 18: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

r(human user) = 0.94

Page 19: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Evaluating the ModelFour Questions to keep in mind:1. Can a simple linear model acquire knowledge of

humanlike word meaning similarities given sufficient input?

2. If successful, is it dependent upon dimensionality of representation?

3. Is the rate of acquisition comparable to a human?

4. What degree of this knowledge is from indirect inferences from combinations of information across samples?

Page 20: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Is It Acquiring Knowledge Model’s knowledge tested with standard

multiple-choice synonym test After training on approx. 2,000 pages of English

text, LSA scored as well as average test-takers on the synonym portion of TOEFL

Acquired knowledge attributed to indirect inference as opposed to direct co-occurrence relations

Page 21: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Two explanations…

1) A substantial portion of the information needed to answer common vocabulary questions could be inferred from the contextual statistics of usage alone

2) Model employs a means of induction-dimension matching that amplifies its learning ability, resulting in correct inference of similarity relations only implicit in temporal correlations of experience

Page 22: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Is dimensionality a factor? Varied number of

dimensions retained Note: What happens when

there is no dimensionality reduction at all

Choosing optimal dimensionality approximately triples the number of words learned

Page 23: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Comparable rate? Learning comparable

to the rate at which school aged children improve their performance on similar tests as a result of reading

Rate of acquisition for late elementary and high school years estimated at 3,000 - 5,400 words per year (10-15 per day)

Page 24: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Calculating Comparable Rate:Direct & Indirect Effects LSA simulations consider

Average number of contexts in which test word appeared (the parameter)

And the total number of other contexts, those that contained no words from the synonym test items

Varied by randomly replacing test words with nonsense words and choosing random subsamples of total text

Joint effects of direct and indirect textual experience

Page 25: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

LSA simulation of total vocabulary gain Came up with a model to fit data: z = a(log b T)(log c S)

T: total number of text samples analyzed S: number of text samples containing stem word

r = .89 For every word estimates were made for

Probability that a word of its frequency appears in the next sample Number of times individual would have encountered the word previously Expected increase in z with the addition of a passage containing the word Expected increase in z with the addition of a passage that doesn’t contain it

Converted z to probability correct x corresponding frequencies Cumulated gains in number correct / all individuals words in the

language to get the total vocabulary gains from reading single text sample

Page 26: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Conclusions from Vocabulary Simulations LSA learns meanings similarities of words from

text, amount equivalent to test scores of moderately competent English readers

Three-fourths of LSA’s knowledge is a product of indirect induction (the exposure of text not containing the word)

Expression of hypothesis that word meanings grow continuously and that correct performance is a stochastic event governed by individual differences in experience i.e. word meanings are constantly in flux

Page 27: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Other Considerations Neurocognitive & Psychological Plausibility

Neural net models Similarity to biological models Parallels with memory

Meaning – Independent of word order? Contextual Disambiguation – In LSA, words

have only one vector representation, thus only one meaning

Page 28: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Mathematical Machine Analogy: a three-layered neural net

LAYER 1: WORD TYPE

LAYER 2: CONCEPTUALREPRESENTATIONS

LAYER 3: TEXT WINDOW

Page 29: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Neural Net Analogy Network is symmetrical – can run in either

direction Different computations made to assess

similarity between two episodes, event types, or an episode and an event type

Page 30: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Similarity to Biological Models Interneuronal communication

Vector multiplication between axons, dendrites and cell bodies

Excitation is proportional to dot product of output and sensitivities of surrounding neurons

Single-cell recordings Population effects described as vector averages

of individual direction representations

Page 31: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Word-versus-context difference:Analogy to Episodic & Semantic Memories

Word representations are semantic, meanings abstracted and averaged from many experiences

Context representations are episodic, unique combinations that occurred only once ever

Both words and episodes represented by same defining dimensions, and relation to one another is still retained

Page 32: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Word-versus-context difference: Analogy to Explicit & Implicit Memories

Retrieving a context vector brings past happening to mind - explicit memory

Retrieving a word vector instantiates abstraction of many happenings brought together - implicit memory

Page 33: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Meaning: independent of word order?

Text segments treated as “bags of words” LSA makes no use of word order, syntax or

grammar Despite assertions that “scrambled sentences

would be worthless context for vocabulary instruction” (Durkin,1983), LSA acquires 100% of its knowledge via “scrambled sentences” and still performs relatively well at deciphering meaning

Page 34: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Expertise LSA account of knowledge brings new

perspective for expertise Simulated expert learns four times more about an

item per exposure than the simulated novice LSA suggests that great masses of knowledge

contribute to superior performance by Direct application of stored knowledge to a problem Greater ability to add new knowledge to long term

memory To infer indirect relations among bits of knowledge and to

generalize from instances and experience

Page 35: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Contextual Disambiguation Frequency-weighted average of predicted usages

Acceptable for words that generate only one or a few closely related meanings (majority of words)

Balanced homographs such as bear result in an LSA vector that doesn’t resemble any of their major meanings

While LSA’s single-vector representation can’t account for multiple word-meaning phenomena at this stage, it is not a fatal flaw (local context will aid in disambiguation)

Page 36: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Text Comprehension: An LSA Interpretation of Construction-Integration Theory

Research in which individual word senses aren’t represented, but overall meaning of phrases/sentences/paragraphs is constructed from linear combination of their words

Vector average reflects overall topic or meaning or passage

Page 37: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Criticisms/ Further Issues Remember: SVD is just one possible,

simple case for a model Assumption: All necessary semantic

information is gleaned from a word’s context (ex. – “love”)

Linguistic structures (i.e. syntax) which show obvious importance for derivation of meaning should be incorporated

Page 38: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Educational Applications of LSA Performance on college exams Scoring the content of an essay Selecting most appropriate text for learners

with different levels of background knowledge

Assisting students to summarize material

Page 39: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Performance on College Exams

Page 40: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Essay Grading

Page 41: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Demonstrations: Write to Learn

Promotes writing skills and reading comprehension

Page 42: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 43: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 44: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 45: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 46: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Demonstrations: Intelligent Essay Assessor (IEA)

Assesses and critiques electronically submitted essays

Provides assessment and feedback

Page 47: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 48: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Demonstration:Summary Street

Web-based reading comprehension and writing instruction tool

Compares student summaries to each section of text and provides feedback

Page 49: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 50: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 51: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.
Page 52: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Demonstration: Super Manual

Program that allows one to identify, develop, and test better ways to organize and present information customized to individual maintainers' level of expertise

Page 53: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Educational Text Selection

Predicts how much readers will learn from texts based on estimated conceptual knowledge of topic and information present in the text they read

Page 54: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Demonstration:State the Essence! LSA provides evaluations to student

summaries of text Guides students toward content that had

been noted by experts to consider most significant A way to measure reading comprehension Summary writing requires construction of mental

representations that joins elements of text information with each other and elements of prior knowledge

Page 55: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.

Summary People appear to know significantly more than they could have learned

from temporally local experiences Proposed induction method dependant on reconstruction of system of

multiple similarity relations in high dimensional space Implemented dimensionality-optimizing induction though SVD matrix

decomposition Model scored as well as the mean scores of foreign students on TOEFL

exams Model learned at a rate similar to school-children and through induction

from data about other words Because LSA didn’t have access to word-similarity information based on

spoken language, morphology, syntax, logic or perceptual word knowledge, concluded that induction method is sufficient to account for Plato’s paradox, at least in domain of knowledge measured by synonym tests

Page 56: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition Paul Fillmore & Stefanie Wong.