CS246 Topic-Based Models. Motivation Q: For query “car”, will a document with the word...

CS246

Topic-Based Models

Motivation Q: For query “car”, will a document with the

word “automobile” be returned as a result under the TF-IDF vector model?

Q: Is it desirable?

Q: What can we do?

Topic-Based Models Index documents based on “topics” not by

individual terms Return a document if it shares the same topic with

the query We can return a document with “automobile” for the

query “car” Much fewer “topics” than “terms”

Topic-based index can be more compact than term-based index

Example (1) Two topics: “Car”, “Movies”

Four terms: car, automobile, movie, theater Topic-term matrix

Document-topic matrix

Topic car automobile

movie theater

“Car” 1 0.9 0 0

“Movie” 0 0 1 0.8

“Car” “Movie”

doc1 0 1

doc2 1 0

doc3 0.8 0.2

Example (2) But what we have is document-term matrix!!!

How are the three matrices related?

car automobile

movie theater

doc1 0 0 1 0.8

doc2 1 0.9 0 0

doc3 0.8 0.72 0.2 0.16

Linearity Assumption A document is generated as a topic-weighted linear

combination of topic-term vectors A simplifying assumption on document generation

doc1 = 0 (1,0.9, 0,0) + 1 (0,0,1,0.8) = ( 0, 0, 1, 0.8)doc3 = 0.8 (1,0.9, 0,0) + 0.2 (0,0,1,0.8) = (0.8,0.72, 0.2, 0.16)

Topic car

automobile movie theater

“Car” 1 0.9 0 0

“Movie” 0 0 1 0.8

car automobile movie theater

doc1 0 0 1 0.8

doc2 1 0.9 0 0

doc3 0.8 0.72 0.2 0.16

“Car” “Movie”

doc1 0 1

doc2 1 0

doc3 0.8 0.2

Topic-Based Index as Matrix Decomposition

8.0100

009.01topic

term

2.08.0

01

10

doc

topic

16.02.072.08.0

009.01

8.0100

doc

term

Topic-Based Index as Matrix Decomposition

# topics << # terms, # topics << # docs Decompose (doc-term) matrix to two matrices of

rank-K (K: # topics) Of course, decomposition will be approximate for real

data

term termtopic

= X

Topic-Based Index as Rank-K Approximation Q: How to choose the two decomposed matrices?

What is the “best” decomposition? Latent Semantic Index (LSI)

Find the decomposition that is the “closest” to the original matrix

Singular-Value Decomposition (SVD) A decomposition method that leads to the best rank-K

approximation We will spend the next few hours to learn about

SVD and its meaning Basic understanding of linear algebra will be very

useful for both IR and datamining

A Brief Review of Linear Algebra Vector and a list of numbers

Addition Scalar multiplication Dot product

Dot product as a projection

Q: (1, 0) vs (0, 1). Are they the same vectors? A: Choice of basis determines the “meaning”

of the numbers Matrix

Matrix multiplication Four ways to look at matrix multiplication

Matrix as vector transformation

Change of Coordinates (1) Two coordinate systems

Q: What are the coordinates of (2,0) under the second coordinate system?

Q: What about (1,1)?

)2

1,

2

1(),

2

1,

2

1(

)1,0(),0,1(

Change of Coordinates (2) In general, we get the new coordinates of a

vector under the new basis vectors by multiplying the original coordinates with the following matrix

Verify with previous example Q: What does the above matrix look like? How

can we identify a coordinate-change matrix?

nbbb

,...,, 21

Tn

T

T

T

b

b

b

Q

...2

1

Matrix and Change of Coordinates vectors are orthonormal to each

other

Orthonormal matrix: An orthonormal matrix can be interpreted as

change-of-coordinate transformation The rows of the matrix Q are the new basis vectors

nbbb

,...,, 21

IQQT

IQQQQ TT

Linear Transformation Linear transformation

Every linear transformation can be represented as a matrix By selecting appropriate basis vectors

Matrix form of a linear transformation can be obtained simply by learning how the basis vectors transform

Verify with 45 degree rotation. What transformations are possible for linear

transformation?

)()()( ybTxaTybxaT

|||

)(...)()(

|||

21 nbTbTbTM

Linear Transformation that We Know Rotation Stretching Anything else?

Claim: Any linear transformation is a stretching followed by a rotation “Meaning” of singular value decomposition An important result of linear algebra Let us learn why this is the case

Rotation Matrix form of rotation? What property will it

have? Remember

Rotation matrix R <=> Orthonormal matrix ’s are unit basis vectors as well

Orthonormal matrix Change of coordinates Rotation

|||

)(...)()(

|||

21 nbTbTbT

IRRT )( ibT

Stretching (1) Q: Matrix form of stretching by 3 along x, y, z

axes in 3D?

Q: Matrix form of stretching by 3 along x axis and by 2 along y axis in 3D.

Q: Stretching matrix <=> diagonal matrix?

Stretching (2) Q: Matrix form of stretching by 3 along

and by 2 along ?

Verify by transforming (1,1) and (-1, 1) Decomposition of T = Q T’ QT shows the

transformation in a different coordinate system Under the matrix form, the simplicity of the

stretching transformation may not be obvious Q: What if we chose as the

basis?

)2

1,

2

1(

)2

1,

2

1(

2521

2125

2121

2121

20

03

2121

2121

)2

1,

2

1( )

2

1,

2

1(

Stretching (3) Under a good choice of basis vectors,

orthogonal-stretching transformation can always be represented as a diagonal matrix

Q: How can we tell whether a matrix corresponds to an orthogonal-stretching transformation?

Stretching – Orthogonal Stretching (1)

Remember that this is orthogonal-stretching along

If a transformation is orthogonal stretching, we should always be able to represent it as QDQT for some Q, where Q shows the stretching axes

Q: What is the matrix form of the transformation that stretches by 5 along (4/5, 3/5) and by 4 along (-3/5, 4/5)?

2521

2125

2121

2121

20

03

2121

2121

)2

1,

2

1( )

2

1,

2

1(

Stretching – Orthogonal Stretching (2) Q: Given a matrix, how do we know whether it

is orthogonal-stretching? A: When it can be decomposed to T = QDQT A: Spectral Theorem

Any symmetric matrix T can always be decomposed into T = QDQT

Symmetric matrix <=> orthogonal stretching Q: How can we decompose T to QDQT ? A: If T stretches along X, then TX = X for some

. X: eigenvector of T : eigenvalue of T Solve the equation for and X

Eigen Values, Eigen Vectors and Orthogonal Stretching Eigenvector: stretching axis Eigenvalue: stretching factor All eigenvectors are orthogonal

<=> Orthogonal stretching<=> Symmetric matrix (spectral theorem)

Example

Q: What transformation is this?

31

13

Singular Value Decomposition (SVD) Any linear transformation T can be decomposed

toT = R S (R: rotation, S: orthogonal stretching) One of the basic results of linear algebra

In matrix form, any matrix T can be decomposed to

Diagonal entries in D: singular values Example

Q: What transformation is this?

TTSSR DQQDQQQT 12

5/45/3

5/35/4

20

03

2/12/1

2/12/1

Singular Value Decomposition (2) Q: For (n x m) matrix T, what will be the

dimension of the three matrices after SVD? Q: What is the meaning of non-square

diagonal matrix? The diagonal matrix is also responsible for

projection (or dimension padding).

Singular Values vs Eigenvalues

Q: What is this transformation? A: Q1 – eigenvectors of TTT

D – square root of eigenvalues of TTT. Similarly, Q2 – eigenvectors of TTT

D – square root of eigenvalues of TTT. SVD can be done by computing eigenvalues

and eigenvectors of TTT and TTT

TTTTT QDQDQQDQQTT 12

11212 )()(

TDQQT 12

SVD as Matrix Approximation

Q: If we want to reduce the rank of T to 2, what will be a good choice?

The best rank-k approximation of any matrix T is to keep the first-k entries of its SVD.

100

05/35/4

05/45/3

1.000

0100

00100

010

2/102/1

2/102/1

T

SVD Approximation Example:1000 x 1000 matrix with (0…255)

62 60 58 57 58 57 55 53 55 5461 60 58 57 57 57 55 53 55 5461 59 58 57 57 56 55 54 55 5559 59 58 57 57 56 55 54 56 5558 58 58 57 56 55 55 55 56 5557 58 58 57 56 55 55 56 56 5556 57 58 57 55 54 55 56 56 5656 57 58 57 55 54 55 56 56 5659 58 57 56 55 56 56 57 59 5758 58 57 57 56 56 56 56 58 5757 57 57 57 57 57 56 56 57 5656 57 57 58 58 57 56 55 56 56

Image of original matrix 1000x1000

SVD. Rank 1 approximation

Original vs Rank 100 approximation

Q: How many numbers do we keep for each?

Back to LSI

LSI: decompose (doc-term) matrix to two matrices of rank-K Our goal is to find the “best” rank-K approximation Apply SVD, keep the top-K singular values, meaning

that we keep the first K column and the first K rows of the first and third matrix after SVD.

term termtopic

= X

LSI and SVD LSI term termtopic

=X

term

=

SVD

LSI and SVD LSI summary

Formulate the topic-based indexing problem as rank-K matrix approximation problem

Use SVD to find the best rank-K approximation When applied to real data, 10-20%

improvement reported Using LSI was the road to fame for Excite in early

days

Limitations of LSI Q: Any problems with LSI? Problems with LSI

Scalability SVD is known to be difficult to perform for a large data

Interpretability Extracted document-topic matrix is impossible to

interpret Difficult to understand why we get good/bad results

from LSI for some queries

Q: Any way to develop more interpretable topic-based indexing? Topic for next lecture

Summary Topic-based indexing

Synonym and polyseme problem Index documents by topic, not by terms

Latent Semantic Index (LSI) Document is a linear combination of its topic vector

and the topic-term vectors Formulate the problem as a rank-K matrix

approximation problem Uses SVD to find the best approximation

Basic linear algebra Linear transformation, matrix, stretching and rotation Orthogonal stretching, diagonal matrix, symmetric

matrix, eigenvalues and eigenvectors Rotation, change of coordinate, and orthonormal matrix SVD and its implication as a linear transformation

CS246 Topic-Based Models. Motivation Q: For query “car”, will a document with the word...

Documents

Transcript of CS246 Topic-Based Models. Motivation Q: For query “car”, will a document with the word...