Near Duplicate Image Detection: min-Hash and tf-idf weighting
CS246 Topic-Based Models. Motivation Q: For query “car”, will a document with the word...
-
Upload
barnaby-smith -
Category
Documents
-
view
217 -
download
1
Transcript of CS246 Topic-Based Models. Motivation Q: For query “car”, will a document with the word...
CS246
Topic-Based Models
Motivation Q: For query “car”, will a document with the
word “automobile” be returned as a result under the TF-IDF vector model?
Q: Is it desirable?
Q: What can we do?
Topic-Based Models Index documents based on “topics” not by
individual terms Return a document if it shares the same topic with
the query We can return a document with “automobile” for the
query “car” Much fewer “topics” than “terms”
Topic-based index can be more compact than term-based index
Example (1) Two topics: “Car”, “Movies”
Four terms: car, automobile, movie, theater Topic-term matrix
Document-topic matrix
Topic car automobile
movie theater
“Car” 1 0.9 0 0
“Movie” 0 0 1 0.8
“Car” “Movie”
doc1 0 1
doc2 1 0
doc3 0.8 0.2
Example (2) But what we have is document-term matrix!!!
How are the three matrices related?
car automobile
movie theater
doc1 0 0 1 0.8
doc2 1 0.9 0 0
doc3 0.8 0.72 0.2 0.16
Linearity Assumption A document is generated as a topic-weighted linear
combination of topic-term vectors A simplifying assumption on document generation
doc1 = 0 (1,0.9, 0,0) + 1 (0,0,1,0.8) = ( 0, 0, 1, 0.8)doc3 = 0.8 (1,0.9, 0,0) + 0.2 (0,0,1,0.8) = (0.8,0.72, 0.2, 0.16)
Topic car
automobile movie theater
“Car” 1 0.9 0 0
“Movie” 0 0 1 0.8
car automobile movie theater
doc1 0 0 1 0.8
doc2 1 0.9 0 0
doc3 0.8 0.72 0.2 0.16
“Car” “Movie”
doc1 0 1
doc2 1 0
doc3 0.8 0.2
Topic-Based Index as Matrix Decomposition
8.0100
009.01topic
term
2.08.0
01
10
doc
topic
16.02.072.08.0
009.01
8.0100
doc
term
Topic-Based Index as Matrix Decomposition
# topics << # terms, # topics << # docs Decompose (doc-term) matrix to two matrices of
rank-K (K: # topics) Of course, decomposition will be approximate for real
data
term termtopic
= X
Topic-Based Index as Rank-K Approximation Q: How to choose the two decomposed matrices?
What is the “best” decomposition? Latent Semantic Index (LSI)
Find the decomposition that is the “closest” to the original matrix
Singular-Value Decomposition (SVD) A decomposition method that leads to the best rank-K
approximation We will spend the next few hours to learn about
SVD and its meaning Basic understanding of linear algebra will be very
useful for both IR and datamining
A Brief Review of Linear Algebra Vector and a list of numbers
Addition Scalar multiplication Dot product
Dot product as a projection
Q: (1, 0) vs (0, 1). Are they the same vectors? A: Choice of basis determines the “meaning”
of the numbers Matrix
Matrix multiplication Four ways to look at matrix multiplication
Matrix as vector transformation
Change of Coordinates (1) Two coordinate systems
Q: What are the coordinates of (2,0) under the second coordinate system?
Q: What about (1,1)?
)2
1,
2
1(),
2
1,
2
1(
)1,0(),0,1(
Change of Coordinates (2) In general, we get the new coordinates of a
vector under the new basis vectors by multiplying the original coordinates with the following matrix
Verify with previous example Q: What does the above matrix look like? How
can we identify a coordinate-change matrix?
nbbb
,...,, 21
Tn
T
T
T
b
b
b
Q
...2
1
Matrix and Change of Coordinates vectors are orthonormal to each
other
Orthonormal matrix: An orthonormal matrix can be interpreted as
change-of-coordinate transformation The rows of the matrix Q are the new basis vectors
nbbb
,...,, 21
IQQT
IQQQQ TT
Linear Transformation Linear transformation
Every linear transformation can be represented as a matrix By selecting appropriate basis vectors
Matrix form of a linear transformation can be obtained simply by learning how the basis vectors transform
Verify with 45 degree rotation. What transformations are possible for linear
transformation?
)()()( ybTxaTybxaT
|||
)(...)()(
|||
21 nbTbTbTM
Linear Transformation that We Know Rotation Stretching Anything else?
Claim: Any linear transformation is a stretching followed by a rotation “Meaning” of singular value decomposition An important result of linear algebra Let us learn why this is the case
Rotation Matrix form of rotation? What property will it
have? Remember
Rotation matrix R <=> Orthonormal matrix ’s are unit basis vectors as well
Orthonormal matrix Change of coordinates Rotation
|||
)(...)()(
|||
21 nbTbTbT
IRRT )( ibT
Stretching (1) Q: Matrix form of stretching by 3 along x, y, z
axes in 3D?
Q: Matrix form of stretching by 3 along x axis and by 2 along y axis in 3D.
Q: Stretching matrix <=> diagonal matrix?
Stretching (2) Q: Matrix form of stretching by 3 along
and by 2 along ?
Verify by transforming (1,1) and (-1, 1) Decomposition of T = Q T’ QT shows the
transformation in a different coordinate system Under the matrix form, the simplicity of the
stretching transformation may not be obvious Q: What if we chose as the
basis?
)2
1,
2
1(
)2
1,
2
1(
2521
2125
2121
2121
20
03
2121
2121
)2
1,
2
1( )
2
1,
2
1(
Stretching (3) Under a good choice of basis vectors,
orthogonal-stretching transformation can always be represented as a diagonal matrix
Q: How can we tell whether a matrix corresponds to an orthogonal-stretching transformation?
Stretching – Orthogonal Stretching (1)
Remember that this is orthogonal-stretching along
If a transformation is orthogonal stretching, we should always be able to represent it as QDQT for some Q, where Q shows the stretching axes
Q: What is the matrix form of the transformation that stretches by 5 along (4/5, 3/5) and by 4 along (-3/5, 4/5)?
2521
2125
2121
2121
20
03
2121
2121
)2
1,
2
1( )
2
1,
2
1(
Stretching – Orthogonal Stretching (2) Q: Given a matrix, how do we know whether it
is orthogonal-stretching? A: When it can be decomposed to T = QDQT A: Spectral Theorem
Any symmetric matrix T can always be decomposed into T = QDQT
Symmetric matrix <=> orthogonal stretching Q: How can we decompose T to QDQT ? A: If T stretches along X, then TX = X for some
. X: eigenvector of T : eigenvalue of T Solve the equation for and X
Eigen Values, Eigen Vectors and Orthogonal Stretching Eigenvector: stretching axis Eigenvalue: stretching factor All eigenvectors are orthogonal
<=> Orthogonal stretching<=> Symmetric matrix (spectral theorem)
Example
Q: What transformation is this?
31
13
Singular Value Decomposition (SVD) Any linear transformation T can be decomposed
toT = R S (R: rotation, S: orthogonal stretching) One of the basic results of linear algebra
In matrix form, any matrix T can be decomposed to
Diagonal entries in D: singular values Example
Q: What transformation is this?
TTSSR DQQDQQQT 12
5/45/3
5/35/4
20
03
2/12/1
2/12/1
Singular Value Decomposition (2) Q: For (n x m) matrix T, what will be the
dimension of the three matrices after SVD? Q: What is the meaning of non-square
diagonal matrix? The diagonal matrix is also responsible for
projection (or dimension padding).
Singular Values vs Eigenvalues
Q: What is this transformation? A: Q1 – eigenvectors of TTT
D – square root of eigenvalues of TTT. Similarly, Q2 – eigenvectors of TTT
D – square root of eigenvalues of TTT. SVD can be done by computing eigenvalues
and eigenvectors of TTT and TTT
TTTTT QDQDQQDQQTT 12
11212 )()(
TDQQT 12
SVD as Matrix Approximation
Q: If we want to reduce the rank of T to 2, what will be a good choice?
The best rank-k approximation of any matrix T is to keep the first-k entries of its SVD.
100
05/35/4
05/45/3
1.000
0100
00100
010
2/102/1
2/102/1
T
SVD Approximation Example:1000 x 1000 matrix with (0…255)
62 60 58 57 58 57 55 53 55 5461 60 58 57 57 57 55 53 55 5461 59 58 57 57 56 55 54 55 5559 59 58 57 57 56 55 54 56 5558 58 58 57 56 55 55 55 56 5557 58 58 57 56 55 55 56 56 5556 57 58 57 55 54 55 56 56 5656 57 58 57 55 54 55 56 56 5659 58 57 56 55 56 56 57 59 5758 58 57 57 56 56 56 56 58 5757 57 57 57 57 57 56 56 57 5656 57 57 58 58 57 56 55 56 56
Image of original matrix 1000x1000
SVD. Rank 1 approximation
SVD. Rank 10 approximation
SVD. Rank 100 approximation
Original vs Rank 100 approximation
Q: How many numbers do we keep for each?
Back to LSI
LSI: decompose (doc-term) matrix to two matrices of rank-K Our goal is to find the “best” rank-K approximation Apply SVD, keep the top-K singular values, meaning
that we keep the first K column and the first K rows of the first and third matrix after SVD.
term termtopic
= X
LSI and SVD LSI term termtopic
=X
term
=
SVD
LSI and SVD LSI summary
Formulate the topic-based indexing problem as rank-K matrix approximation problem
Use SVD to find the best rank-K approximation When applied to real data, 10-20%
improvement reported Using LSI was the road to fame for Excite in early
days
Limitations of LSI Q: Any problems with LSI? Problems with LSI
Scalability SVD is known to be difficult to perform for a large data
Interpretability Extracted document-topic matrix is impossible to
interpret Difficult to understand why we get good/bad results
from LSI for some queries
Q: Any way to develop more interpretable topic-based indexing? Topic for next lecture
Summary Topic-based indexing
Synonym and polyseme problem Index documents by topic, not by terms
Latent Semantic Index (LSI) Document is a linear combination of its topic vector
and the topic-term vectors Formulate the problem as a rank-K matrix
approximation problem Uses SVD to find the best approximation
Basic linear algebra Linear transformation, matrix, stretching and rotation Orthogonal stretching, diagonal matrix, symmetric
matrix, eigenvalues and eigenvectors Rotation, change of coordinate, and orthonormal matrix SVD and its implication as a linear transformation