Utrecht University - Universiteit Utrecht
Transcript of Utrecht University - Universiteit Utrecht
![Page 1: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/1.jpg)
Recommendation Systems
Unsupervised Machine Learning
Prof. Yannis VelegrakisUtrecht University
[email protected]://velgias.github.io
![Page 2: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/2.jpg)
2
Disclaimer
The following set of slides originates from a number of different presentations and courses of the following people. l Yannis Velegrakis (Utrecht University)l Jeff Ullman (Stanford University)l Bill Howe (U of Washington)l Martin Fouler (Thought Works) l Ekaterini Ioannou (Tilburg University)l Themis Palpanas (U of Paris-Descartes)l Yannis Velegrakis (Utrecht University)l Copyright stays with the authors.
No distribution is allowed without prior permission by the authors.
![Page 3: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/3.jpg)
3
Example: Recommender Systems
l Customer Xn Buys Metallica CDn Buys Megadeth CD
l Customer Yn Does search on Metallican Recommender system
suggests Megadeth from data collected about customer X
![Page 4: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/4.jpg)
4
Recommendations
Items
Search Recommendations
Products, web sites, blogs, news items, …
4
Examples:
![Page 5: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/5.jpg)
5
From Scarcity to Abundance
l Shelf space is a scarce commodity for traditional retailers n Also: TV networks, movie theaters,…
l Web enables near-zero-cost dissemination of information about productsn From scarcity to abundance
l More choice necessitates better filtersn Recommendation enginesn How Into Thin Air made Touching the Void
a bestseller: http://www.wired.com/wired/archive/12.10/tail.html
![Page 6: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/6.jpg)
6
Sidenote: The Long Tail
Source: Chris Anderson (2004)
6
![Page 7: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/7.jpg)
7
Physical vs. Online
Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!
![Page 8: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/8.jpg)
8
Types of Recommendations
l Editorial and hand curatedn List of favoritesn Lists of “essential” items
l Simple aggregatesn Top 10, Most Popular, Recent Uploads
l Tailored to individual usersn Amazon, Netflix, …
8
![Page 9: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/9.jpg)
9
Formal Model
lX = set of CustomerslS = set of Items
lUtility function u: X × S à RnR = set of ratingsnR is a totally ordered setne.g., 0-5 stars, real number in [0,1]
9
![Page 10: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/10.jpg)
10
Utility Matrix
0.410.2
0.30.50.21
Avatar LOTR Matrix Pirates
Alice
Bob
Carol
David
10
![Page 11: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/11.jpg)
11
Key Problems
l (1) Gathering “known” ratings for matrixn How to collect the data in the utility matrix
l (2) Extrapolate unknown ratings from the known onesn Mainly interested in high unknown ratings
uWe are not interested in knowing what you don’t like but what you like
l (3) Evaluating extrapolation methodsn How to measure success/performance of
recommendation methods
11
![Page 12: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/12.jpg)
12
(1) Gathering Ratings
l Explicitn Ask people to rate itemsn Doesn’t work well in practice – people
can’t be bothered
l Implicitn Learn ratings from user actions
uE.g., purchase implies high ratingn What about low ratings?
12
![Page 13: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/13.jpg)
13
(2) Extrapolating Utilities
l Key problem: Utility matrix U is sparsen Most people have not rated most itemsn Cold start:
uNew items have no ratingsuNew users have no history
l Two approaches to recommender systems:n 1) Content-basedn 2) Collaborative
![Page 14: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/14.jpg)
Content-based Recommender Systems
![Page 15: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/15.jpg)
15
Content-based Recommendations
l Main idea: Recommend items to customer x similar to previous items rated highly by x
Example:l Movie recommendations
n Recommend movies with same actor(s), director, genre, …
l Websites, blogs, newsn Recommend other sites with “similar” content
15
![Page 16: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/16.jpg)
16
Plan of Action
likes
Item profiles
RedCircles
Triangles
User profile
match
recommendbuild
16
![Page 17: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/17.jpg)
17
Item Profiles
l For each item, create an item profile
l Profile is a set (vector) of featuresn Movies: author, title, actor, director,…n Text: Set of “important” words in document
l How to pick important features?n Usual heuristic from text mining is TF-IDF
(Term frequency * Inverse Doc Frequency)uTerm … FeatureuDocument … Item
17
![Page 18: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/18.jpg)
18
Sidenote: TF-IDF
fij = frequency of term (feature) i in doc (item) j
ni = number of docs that mention term iN = total number of docs
TF-IDF score: wij = TFij × IDFi
Doc profile = set of words with highest TF-IDF scores, together with their scores
18
Note: we normalize TFto discount for “longer”
documents
![Page 19: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/19.jpg)
19
User Profiles
![Page 20: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/20.jpg)
20
Example 1: Boolean Utility Matrix
![Page 21: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/21.jpg)
21
Example 2: Star Ratings
![Page 22: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/22.jpg)
22
Making Predictions
![Page 23: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/23.jpg)
23
User Profiles and Prediction
l User profile possibilities:n Weighted average of rated item profilesn Variation: weight by difference from average
rating for itemn …
l Prediction heuristic:n Given user profile x and item profile i, estimate 𝑢(𝒙, 𝒊) =
cos(𝒙, 𝒊) = 𝒙·𝒊| 𝒙 |⋅| 𝒊 |
![Page 24: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/24.jpg)
24
Pros: Content-based Approach
l +: No need for data on other usersn No cold-start or sparsity problems
l +: Able to recommend to users with unique tastes
l +: Able to recommend new & unpopular itemsn No first-rater problem
l +: Able to provide explanationsn Can provide explanations of recommended items by listing
content-features that caused an item to be recommended
![Page 25: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/25.jpg)
25
Cons: Content-based Approach
l –: Finding the appropriate features is hardn E.g., images, movies, music
l –: Recommendations for new usersn How to build a user profile?
l –: Overspecializationn Never recommends items outside user’s
content profilen People might have multiple interestsn Unable to exploit quality judgments of other users
![Page 26: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/26.jpg)
Collaborative Filtering
nHarnessing the judgment of other users
![Page 27: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/27.jpg)
27
Collaborative Filtering
l Consider user x
l Find set N of other users whose ratings are “similar” to x’s ratings
l Estimate x’s ratings based on ratings of users in N
x
N
![Page 28: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/28.jpg)
28
![Page 29: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/29.jpg)
29
Option 1: Jaccard Similarity
![Page 30: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/30.jpg)
30
Option 2: Cosine Similarity
![Page 31: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/31.jpg)
31
Option 3: Centered Cosine
![Page 32: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/32.jpg)
32
Centered Cosine Similarity (2)
![Page 33: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/33.jpg)
33
Rating Predictions
![Page 34: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/34.jpg)
34
Item-Item Collaborative Filtering
l So far: User-user collaborative filteringl Another view: Item-item
n For item i, find other similar itemsn Estimate rating for item i based
on ratings for similar itemsn Can use same similarity metrics and
prediction functions as in user-user model
åå
Î
Î×
=);(
);(
xiNj ij
xiNj xjijxi s
rsr
sij… similarity of items i and jrxj…rating of user u on item jN(i;x)… set items rated by x similar to i
![Page 35: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/35.jpg)
35
Item-Item CF (|N|=2)
121110987654321
455311
3124452
534321423
245424
5224345
423316
usersm
ovi
es
- unknown rating - rating between 1 to 5
![Page 36: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/36.jpg)
36
Item-Item CF (|N|=2)
121110987654321
455? 311
3124452
534321423
245424
5224345
423316
users
- estimate rating of movie 1 by user 5
mo
vies
![Page 37: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/37.jpg)
37
Item-Item CF (|N|=2)
121110987654321
455? 311
3124452
534321423
245424
5224345
423316
users
Neighbor selection:Identify movies similar to movie 1, rated by user 5
mo
vies
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)
Here we use Pearson correlation as similarity:1) Subtract mean rating mi from each movie i
m1 = (1+3+5+5+4)/5 = 3.6row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]2) Compute cosine similarities between rows
![Page 38: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/38.jpg)
38
Item-Item CF (|N|=2)
121110987654321
455? 311
3124452
534321423
245424
5224345
423316
users
Compute similarity weights:s1,3=0.41, s1,6=0.59
mo
vies
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)
![Page 39: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/39.jpg)
39
Item-Item CF (|N|=2)
121110987654321
4552.6311
3124452
534321423
245424
5224345
423316
users
Predict by taking weighted average:
r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6
mo
vies
![Page 40: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/40.jpg)
40
Item-Item vs. User-User
0.418.010.90.30.5
0.81Avatar LOTR Matrix Pirates
Alice
Bob
Carol
David
¡ In practice, it has been observed that item-itemoften works better than user-user
¡ Why? Items are simpler, users have multiple tastes
![Page 41: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/41.jpg)
41
Implementing Collaborative Filtering
![Page 42: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/42.jpg)
42
Collaborative Filtering: Complexity
![Page 43: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/43.jpg)
43
Pros/Cons of Collaborative Filtering
l + Works for any kind of itemn No feature selection needed
l - Cold Start:n Need enough users in the system to find a match
l - Sparsity: n The user/ratings matrix is sparsen Hard to find users that have rated the same items
l - First rater: n Cannot recommend an item that has not been
previously ratedn New items, Esoteric items
l - Popularity bias: n Cannot recommend items to someone with
unique taste n Tends to recommend popular items
![Page 44: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/44.jpg)
44
Hybrid Methods
l Implement two or more different recommenders and combine predictionsn Perhaps using a linear model
l Add content-based methods to collaborative filteringn Item profiles for new item problemn Demographics to deal with new user problem
![Page 45: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/45.jpg)
45
Global Baseline Estimate
![Page 46: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/46.jpg)
46
Combining Global Baseline Estimate with CF
![Page 47: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/47.jpg)
47
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 2 2
5
2 1 1
3 3
1
480,000 users
17,700 movies
Matrix R
![Page 48: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/48.jpg)
48
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 ? ?
?
2 1 ?
3 ?
1
Test Data Set
RMSE = ./
∑(1,2)∈4 �̂�21 − 𝑟21 8
480,000 users
17,700 movies
Predicted rating
True rating of user x on item i
𝒓𝟑,𝟔
Matrix R
Training Data Set
![Page 49: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/49.jpg)
49
Problems with Error Measures
l Narrow focus on accuracy sometimes misses the pointn Prediction Diversityn Prediction Contextn Order of predictions
l In practice, we care only to predict high ratings:n RMSE might penalize a method that does well
for high ratings and badly for others
![Page 50: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/50.jpg)
50
Collaborative Filtering: Complexity
l Expensive step is finding k most similar customers: O(|X|) l Too expensive to do at runtime
n Could pre-compute
l Naïve pre-computation takes time O(k ·|X|)v X … set of customers
l We already know how to do this!n Near-neighbor search in high dimensions (LSH)n Clusteringn Dimensionality reduction
![Page 51: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/51.jpg)
51
Tip: Add Data
l Leverage all the datan Don’t try to reduce data size in an
effort to make fancy algorithms workn Simple methods on large data do best
l Add more datan e.g., add IMDB data on genres
l More data beats better algorithmshttp://anand.typepad.com/datawocky/2008/03/more-data-usual.html
![Page 52: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/52.jpg)
52
Grand Prize: 0.8563
Netflix: 0.9514
Movie average: 1.0533
User average: 1.0651
Global average: 1.1296
Performance of Various Methods
Basic Collaborative filtering: 0.94
Latent factors: 0.90
Latent factors+Biases: 0.89
Collaborative filtering++: 0.91
Latent factors+Biases+Time: 0.876
When no prize…LGetting desperate.
Try a “kitchen sink” approach!
![Page 53: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/53.jpg)
Dimensionality Reduction
![Page 54: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/54.jpg)
54
Dimensionality Reduction
l Assumption: Data lies on or near a low d-dimensional subspace
l Axes of this subspace are effective representation of the data
![Page 55: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/55.jpg)
55
Dimensionality Reduction
l Compress / reduce dimensionality:n 106 rows; 103 columns; no updatesn Random access to any cell(s); small error: OK
The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]
![Page 56: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/56.jpg)
56
Rank is “Dimensionality”
l Cloud of points 3D space:n Think of point positions
as a matrix:
l We can rewrite coordinates more efficiently!n Old basis vectors: [1 0 0] [0 1 0] [0 0 1]n New basis vectors: [1 2 1] [-2 -3 1]n Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]
uNotice: We reduced the number of coordinates!
1 row per point:
ABC A
![Page 57: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/57.jpg)
57
Dimensionality Reduction
l Goal of dimensionality reduction is to discover the axis of data!
Rather than representingevery point with 2 coordinateswe represent each point with
1 coordinate (corresponding tothe position of the point on
the red line).
By doing this we incur a bit oferror as the points do not
exactly lie on the line
![Page 58: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/58.jpg)
58
Why Reduce Dimensions?
Why reduce dimensions?l Discover hidden correlations/topics
n Words that occur commonly together
l Remove redundant and noisy featuresn Not all words are useful
l Interpretation and visualizationl Easier storage and processing of the data
58
![Page 59: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/59.jpg)
59
SVD - Definition
A[m x n] = U[m x r] S [ r x r] (V[n x r])T
l A: Input data matrixn m x n matrix (e.g., m documents, n terms)
l U: Left singular vectors n m x r matrix (m documents, r concepts)
l S: Singular valuesn r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)l V: Right singular vectors
n n x r matrix (n terms, r concepts)
![Page 60: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/60.jpg)
60
SVD - Properties
It is always possible to decompose a real matrix A into A = U S VT , where
l U, S, V: uniquel U, V: column orthonormal
n UT U = I; VT V = I (I: identity matrix)n (Columns are orthogonal unit vectors)
l S: diagonaln Entries (singular values) are positive,
and sorted in decreasing order (σ1 ³ σ2 ³ ... ³ 0)
![Page 61: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/61.jpg)
61
SVD – Example: Users-to-Movies
l A = U S VT - example: Users to Movies
=SciFi
Romnce
x x
Mat
rix
Alie
nSer
enity
Cas
abla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 62: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/62.jpg)
62
SVD – Example: Users-to-Movies
l A = U S VT - example: Users to MoviesSciFi-concept
Romance-concept
=SciFi
Romnce
x x
Mat
rix
Alie
nSer
enity
Cas
abla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 63: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/63.jpg)
63
SVD – Example: Users-to-Movies
l A = U S VT - example:
Romance-concept
U is “user-to-concept” similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
nSer
enity
Cas
abla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 64: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/64.jpg)
64
SVD – Example: Users-to-Movies
lA = U S VT - example:
SciFi
Romnce
SciFi-concept
“strength” of the SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
nSer
enity
Cas
abla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 65: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/65.jpg)
65
SVD – Example: Users-to-Movies
l A = U S VT - example:
SciFi-concept
V is “movie-to-concept”similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
nSer
enity
Cas
abla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 66: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/66.jpg)
Dimensionality Reduction with SVD
![Page 67: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/67.jpg)
67
SVD – Dimensionality Reduction
l Goal: Minimize the sumof reconstruction errors:
uwhere are the “old” and are the “new” coordinates
l SVD gives ‘best’ axis to project on:n ‘best’ = minimizing the reconstruction errors
l In other words, minimum reconstruction error
v1
first right singular vector
Movie 1 rating
Mov
ie 2
ratin
g
![Page 68: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/68.jpg)
68
SVD - Interpretation #2
More detailsl Q: How exactly is dim. reduction done?
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 69: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/69.jpg)
69
SVD - Interpretation #2
More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 70: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/70.jpg)
70
SVD - Interpretation #2
More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
»
![Page 71: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/71.jpg)
71
SVD - Interpretation #2
More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
»
![Page 72: Utrecht University - Universiteit Utrecht](https://reader031.fdocuments.us/reader031/viewer/2022021305/6207349998e7a9366012c9cc/html5/thumbnails/72.jpg)
72
SVD - Interpretation #2
More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero
» x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29
12.4 0 0 9.5
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69