G54DMT – Data Mining Techniques and Applications jqb/G54DMT jqb/G54DMT Dr. Jaume Bacardit...
-
Upload
tracy-harvey-webb -
Category
Documents
-
view
221 -
download
3
Transcript of G54DMT – Data Mining Techniques and Applications jqb/G54DMT jqb/G54DMT Dr. Jaume Bacardit...
G54DMT – Data Mining Techniques and Applications
http://www.cs.nott.ac.uk/~jqb/G54DMT
Dr. Jaume [email protected]
Topic 4: ApplicationsLecture 1: The Netflix Challenge
Some material taken from http://en.wikipedia.org/wiki/Netflix_Prize, http://www.flickr.com/photos/chef_ele/3791293142/sizes/o/in/set-72157621825510293/, http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf and http://arxiv.org/abs/0911.0460
Outline
• The challenge and its assessment• Timeline of progress• Recommendation methods• Matrix Factorisation techniques• Ensemble methods• Lessons learnt• Resources
The Netflix Challenge
• Netflix is an online video rental company
• One of its most relevant components is its move recommendation system– Suggest movies to users based on
their past ratings
• In 2006 netflix made its recommendation database public
• Challenged the community to produce a new recommender that was 10% better than their own method
• Winner would get $1M
Training data
• Movie ratings collected from 1998 to 2005• 100,480,507 ratings that 480,189 users gave to 17,770
movies.• Training data divided in
– Training set (99,072,112 ratings) – Probe set (1,408,395 ratings)
• Each rating was a quadruplet <user,movie,date of rating,rating>
• Very sparse data: the number of ratings is a very small fraction of users x movies
Test data
• Qualifying data were triplets <user,movie,data of rating>
• Qualifying set (2,817,131 ratings) consisting of: – Test set (1,408,789 ratings), used to determine winners– Quiz set (1,408,342 ratings), used to calculate leaderboard
scores
• Participants did not know which instances were part of the test set and which part of the quiz set
• Test, quiz and probe set were created to have similar statistical properties
Assessment
• Error on the quiz and test set was computed as Root Mean Squared Error (RMSE), rounded to 4 digits
• RMSE of the Cinematch system (Netflix own predictor) = 0.9525– Target RMSE = 0.8572
• Once a participant improves the target RMSE, a “last call” period of 30 days start
• At the end of the 30 days, the participant with lowest test RMSE is declared the winner
• In case of ties, the prize goes to the earliest entry
Progress in the challenge
• Data released on October 2nd, 2006• On October 8th a participant already had better RMSE
than Cinematch• The 2007 progress prize was awarded to BellKor with
an improvement of 8.43%• The 2008 progress prize was awarded to “BellKor in
BigChaos” with an improvement of 9.44%• In June 26th, 2009, the team "BellKor's Pragmatic
Chaos" achieved an improvement of 10.05%. The “last call” period started
Progress: Last call period
• On July 25, 2009 the team "The Ensemble", a merger of the teams "Grand Prize Team" and "Opera Solutions and Vandelay United", achieved a 10.09% improvement
• After the last call period ended, two teams leaded the quiz leaderboard:– "The Ensemble" with a 10.10% improvement– "BellKor's Pragmatic Chaos" with a 10.09% improvement
• On the test set both teams were tied with an improvement of 10.06%
• BellKor's Pragmatic Chaos was declared the winner because they had submitted their entry 20 minutes before The Ensemble
Recommender systems: Content Filtering
• Collect background information from users and movies to generate a profile of each of them– Users: demographic information– Movies: genre, actors, box office results
• Produce recommendations by matching the profiles of users and movies
• Costly as many times it’s difficult to collect all this information or it’s simply not available
Recommender systems: collaborative filtering
• Generate predictions of ratings only based on the past behavior of the users
• No background domain knowledge required• Easier to generate the models• Faces difficulties to start up: when not enough
ratings are available
Collaborative filtering: neighbourhood methods
• Compute relationship between items or users
• Identify which movies are similar to each other, based on receiving similar ratings from the same user
• Hierarchical clustering showing the similarities of 5000 movies
Collaborative filtering: latent factor models
• Automatically map users and movies into a new space of factors (same for both of them)
Matrix Factorisation methods
• Most successful of the latent factors methods• These methods generate a vector qif for each
item and a vector puf for each user
• A prediction is a linear combination of both vectors
• The problem of finding the vectors q and p for each movie and user is defined as the following optimisation problem
Training set
Actual rating
Predictedrating
Regularisation term (avoid overfitting)
Optimisation methods• Stochastic gradient descend
– Iteratively samples training examples, computes the prediction errors and adjusts the vectors of the involved user and item accordingly
• Alternating least squares– The original definition of the optimization problem is not convex, and
hence cannot be solved to optimality– If either p or q is fixed, the problem is convex and can be solved using
least squares methods– This method alternates between two states, where in each state it
fixes either p or q
Bias in the models
• Not all movies receive the same distribution of ratings– Some are more popular
• Not all users give the same distribution of ratings– Some users are more strict than others
• Refinement of the model introducing bias terms
Averageoverallrating
Bias of item i
Bias of user u
Additional input sources
• Implicit feedback– User has preference over certain movies,
therefore it will not produce ratings for anything
• Demographic information– If available
Temporal dynamics
• Ratings change through time• Users
– May change tastes– May produce more/less strict ratings in different periods
of time
• Movies– Blockbusters may fade in popularity– Cult movies may become more popular
Impact of all components of the model (BellKor)
Ensemble methods
• All top participants methods combined (blended) the predictions of hundreds of models of many types– Matrix Factorisation– Neighbourhood methods– Restircted Boltzmann Machines methods
• Many ways of combining the models– Linear combinations– Neural networks– Regression trees
Basic linear regression method
• Need to optimize the vector of weights associated to each method
• Can use e.g. least squares method for this, optimizing over the probe set
• How to choose the models to include in the ensemble?– Forward method: start with one, keep adding until the
probe set error degrades– Backward method: start with all, keep removing while
the probe set error improves
Feature-Weighted Linear Stacking
• Method from “The Ensemble”• Not all models are suitable for all kind of movies/users• Generate a set of “meta-features” for each instance that are
used to calibrate the linear combination of weights specifically for each case
• vij = weight associated to feature j for model I
• fj(x) = value of feature j for instance x
• gi(x) = prediction of model I for instance x
Top 10 features (out of 25)
Lessons learnt from the challenge
• Well defined competition (clear rules, instant feedback of progress, forums to discuss)
• Great collaboration between participants, sharing ideas, combining efforts
• Widen the awareness of statistical and machine learning in the mainstream society
• It has provided a big challenge to the ML community, and hence, new science was done
Resources• Challenge web page• Very nice article about Matrix Factorisation• Article on Feature-Weighted Linear Stacking• Progress reports of
• BellKor• BigChaos• PragmaticTheory
• Web page of “The Ensemble”