Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore)...
-
Upload
andra-booth -
Category
Documents
-
view
220 -
download
0
Transcript of Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore)...
![Page 1: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/1.jpg)
Homework
• Define a loss function that compares two matrices (say mean square error)
• b = svd(bellcore)• b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])• b3 = b$u[,1:3] %*% diag(b$d[1:3]) %*% t(b$v[,1:3])• More generally, for all possible r
– Let b.r = b$u[,1:r] %*% diag(b$d[1:r]) %*% t(b$v[,1:r])• Compute the loss between bellcore and b.r as a function
of r• Plot the loss as a function of r
![Page 2: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/2.jpg)
IR Models
• Keywords (and Boolean combinations thereof)• Vector-Space ‘‘Model’’ (Salton, chap 10.1)– Represent the query and the documents as V-
dimensional vectors– Sort vectors by
• Probabilistic Retrieval Model– (Salton, chap 10.3)– Sort documents by
€
sim(x,y) = cos(x, y) =
x i ⋅ y i
i
∑| x |⋅ | y |
€
score(d) =Pr(w | rel)
Pr(w | rel)w∈d
∏
![Page 3: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/3.jpg)
Information Retrieval and Web SearchAlternative IR models
Instructor: Rada Mihalcea
Some of the slides were adopted from a course tought at Cornell University by William Y. Arms
![Page 4: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/4.jpg)
Latent Semantic Indexing
Objective
Replace indexes that use sets of index terms by indexes that use concepts.
Approach
Map the term vector space into a lower dimensional space, using singular value decomposition.
Each dimension in the new space corresponds to a latent concept in the original data.
![Page 5: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/5.jpg)
Deficiencies with Conventional Automatic Indexing
Synonymy: Various words and phrases refer to the same concept (lowers recall).
Polysemy: Individual words have more than one meaning (lowers precision)
Independence: No significance is given to two terms that frequently appear together
Latent semantic indexing addresses the first of these (synonymy), and the third (dependence)
![Page 6: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/6.jpg)
Bellcore’s Examplehttp://en.wikipedia.org/wiki/Latent_semantic_analysis
c1 Human machine interface for Lab ABC computer applications
c2 A survey of user opinion of computer system response time
c3 The EPS user interface management system
c4 System and human system engineering testing of EPS
c5 Relation of user-perceived response time to error measurement
m1 The generation of random, binary, unordered trees
m2 The intersection graph of paths in trees
m3 Graph minors IV: Widths of trees and well-quasi-ordering
m4 Graph minors: A survey
![Page 7: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/7.jpg)
Term by Document Matrix
![Page 8: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/8.jpg)
"bellcore"<-structure(.Data = c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0,0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1), .Dim = c(12, 9), .Dimnames = list(c("human", "interface", "computer", "user","system", "response", "time", "EPS", "survey", "trees", "graph","minors"), c("c1", "c2", "c3", "c4", "c5", "m1", "m2", "m3", "m4")))
help(dump)help(source)
![Page 9: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/9.jpg)
Query ExpansionQuery:
Find documents relevant to human computer interaction
Simple Term Matching:
Matches c1, c2, and c4Misses c3 and c5
![Page 10: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/10.jpg)
LargeCorrel-ations
![Page 11: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/11.jpg)
Correlations: Too Large to Ignore
![Page 12: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/12.jpg)
How to compute correlationsround(100 * cor(bellcore)) c1 c2 c3 c4 c5 m1 m2 m3 m4c1 100 -19 0 0 -33 -17 -26 -33 -33c2 -19 100 0 0 58 -30 -45 -58 -19c3 0 0 100 47 0 -21 -32 -41 -41c4 0 0 47 100 -31 -16 -24 -31 -31c5 -33 58 0 -31 100 -17 -26 -33 -33m1 -17 -30 -21 -16 -17 100 67 52 -17m2 -26 -45 -32 -24 -26 67 100 77 26m3 -33 -58 -41 -31 -33 52 77 100 56m4 -33 -19 -41 -31 -33 -17 26 56 100
round(100 * cor(t(bellcore))) human interface computer user system response time EPS survey trees graph minorshuman 100 36 36 -38 43 -29 -29 36 -29 -38 -38 -29interface 36 100 36 19 4 -29 -29 36 -29 -38 -38 -29computer 36 36 100 19 4 36 36 -29 36 -38 -38 -29user -38 19 19 100 23 76 76 19 19 -50 -50 -38system 43 4 4 23 100 4 4 82 4 -46 -46 -35response -29 -29 36 76 4 100 100 -29 36 -38 -38 -29time -29 -29 36 76 4 100 100 -29 36 -38 -38 -29EPS 36 36 -29 19 82 -29 -29 100 -29 -38 -38 -29survey -29 -29 36 19 4 36 36 -29 100 -38 19 36trees -38 -38 -38 -50 -46 -38 -38 -38 -38 100 50 19graph -38 -38 -38 -50 -46 -38 -38 -38 19 50 100 76minors -29 -29 -29 -38 -35 -29 -29 -29 36 19 76 100
![Page 13: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/13.jpg)
plot(hclust(as.dist(-cor(t(bellcore)))))
![Page 14: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/14.jpg)
plot(hclust(as.dist(-cor(bellcore))))
![Page 15: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/15.jpg)
Correcting for
Large Correlations
![Page 16: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/16.jpg)
Thesaurus
![Page 17: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/17.jpg)
Term by Doc Matrix:
Before & After Thesaurus
![Page 18: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/18.jpg)
Singular Value Decomposition (SVD)X = UDVT
X = U
VTD
t x d t x m m x dm x m
• m is the rank of X < min(t, d)
• D is diagonal
– D2 are eigenvalues (sorted in descending order)
• U UT = I and V VT = I
– Columns of U are eigenvectors of X XT
– Columns of V are eigenvectors of XT X
![Page 19: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/19.jpg)
• m is the rank of X < min(t, d)
• D is diagonal
– D2 are eigenvalues (sorted in descending order)
• U UT = I and V VT = I
– Columns of U are eigenvectors of X XT
– Columns of V are eigenvectors of XT X
![Page 20: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/20.jpg)
Dimensionality Reduction
X =
t x d t x k k x dk x k
k is the number of latent concepts
(typically 300 ~ 500)
U
D VT
^
![Page 21: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/21.jpg)
Dimension Reduction in R
b = svd(bellcore)b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])dimnames(b2) = dimnames(bellcore)par(mfrow=c(2,2))plot(hclust(as.dist(-cor(bellcore))))plot(hclust(as.dist(-cor(t(bellcore)))))plot(hclust(as.dist(-cor(b2))))plot(hclust(as.dist(-cor(t(b2)))))
![Page 22: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/22.jpg)
![Page 23: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/23.jpg)
SVDB BT = U D2 UT
BT B = V D2 VT
Latent
Term
Doc
![Page 24: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/24.jpg)
Dimension Reduction Block Structureround(100*cor(bellcore)) c1 c2 c3 c4 c5 m1 m2 m3 m4c1 100 -19 0 0 -33 -17 -26 -33 -33c2 -19 100 0 0 58 -30 -45 -58 -19c3 0 0 100 47 0 -21 -32 -41 -41c4 0 0 47 100 -31 -16 -24 -31 -31c5 -33 58 0 -31 100 -17 -26 -33 -33m1 -17 -30 -21 -16 -17 100 67 52 -17m2 -26 -45 -32 -24 -26 67 100 77 26m3 -33 -58 -41 -31 -33 52 77 100 56m4 -33 -19 -41 -31 -33 -17 26 56 100> round(100*cor(b2)) c1 c2 c3 c4 c5 m1 m2 m3 m4c1 100 91 100 100 84 -86 -85 -85 -81c2 91 100 91 88 99 -57 -56 -56 -50c3 100 91 100 100 84 -86 -85 -85 -81c4 100 88 100 100 81 -89 -88 -88 -84c5 84 99 84 81 100 -44 -44 -43 -37m1 -86 -57 -86 -89 -44 100 100 100 100m2 -85 -56 -85 -88 -44 100 100 100 100m3 -85 -56 -85 -88 -43 100 100 100 100m4 -81 -50 -81 -84 -37 100 100 100 100
![Page 25: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/25.jpg)
Dimension Reduction Block Structureround(100*cor(t(bellcore))) human interface computer user system response time EPS survey trees graph minorshuman 100 36 36 -38 43 -29 -29 36 -29 -38 -38 -29interface 36 100 36 19 4 -29 -29 36 -29 -38 -38 -29computer 36 36 100 19 4 36 36 -29 36 -38 -38 -29user -38 19 19 100 23 76 76 19 19 -50 -50 -38system 43 4 4 23 100 4 4 82 4 -46 -46 -35response -29 -29 36 76 4 100 100 -29 36 -38 -38 -29time -29 -29 36 76 4 100 100 -29 36 -38 -38 -29EPS 36 36 -29 19 82 -29 -29 100 -29 -38 -38 -29survey -29 -29 36 19 4 36 36 -29 100 -38 19 36trees -38 -38 -38 -50 -46 -38 -38 -38 -38 100 50 19graph -38 -38 -38 -50 -46 -38 -38 -38 19 50 100 76minors -29 -29 -29 -38 -35 -29 -29 -29 36 19 76 100> round(100*cor(t(b2))) human interface computer user system response time EPS survey trees graph minorshuman 100 100 93 94 99 82 82 100 -12 -85 -84 -83interface 100 100 95 96 100 85 85 100 -7 -82 -80 -80computer 93 95 100 100 96 98 98 93 26 -59 -57 -56user 94 96 100 100 97 97 97 94 23 -62 -60 -59system 99 100 96 97 100 88 88 100 -2 -79 -78 -77response 82 85 98 97 88 100 100 83 46 -40 -38 -37time 82 85 98 97 88 100 100 83 46 -40 -38 -37EPS 100 100 93 94 100 83 83 100 -11 -84 -83 -82survey -12 -7 26 23 -2 46 46 -11 100 63 65 66trees -85 -82 -59 -62 -79 -40 -40 -84 63 100 100 100graph -84 -80 -57 -60 -78 -38 -38 -83 65 100 100 100minors -83 -80 -56 -59 -77 -37 -37 -82 66 100 100 100
![Page 26: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/26.jpg)
t1
t2
t3
d1 d2
The space has as many dimensions as there are terms in the word list.
The term vector space
![Page 27: Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e175503460f94b03721/html5/thumbnails/27.jpg)
• term
document
query
--- cosine > 0.9
Latent concept vector space