A Cognitive Psychologist's- Approach to Data Mining
- How I beat Netflix Cinematch
Maggie XiongApril 22, 2014
Parallel FrameworksCognitive psychology & data mining
Case StudyThe Netflix Prize Project
General Outline
Abstraction and Generalization
CategorizationPrototypeExemplarDecision boundaryTheory-based categories
Semantic space / LSAConnectionism
Abstraction
Linguistic ideas (Bransford & Franks, 1971)“The ants in the kitchen ate the sweet jelly which was on the table.”“The ants in the kitchen ate the sweet jelly.”“The ants in the kitchen ate the jelly.”“The ants were in the kitchen.”Participants were more confident in “recognizing” fuller sentences.
Prototype (Posner & Keele, 1968)Participants studied instances generated from distortions of prototypes.They showed the same accuracy and response time for never-seen
prototypes and memorized instances in a later test.
Categorization
Category structure (Collins & Quillian, 1969)Economy of organizationParticipants takes longer to respond to statements across category levels.Typicality
Exemplar(Jacoby & Brooks, 1984)
Decision BoundaryTheory-based Categories
Decision boundary (Ashby & Gott, 1988)
Theory-based categories (Murphy & Medin, 1985)Categories organized around theories about the world.clean vs unclean foods; apples and prime numbers
Semantic SpaceLatent Semantic Analysis
Shepard, 1987Probability of generalization decays exponentially with
distance.
Osgood, 1957Factor analysisEvaluative, potency, activity
Dumais et al., 1988SVD, cosine similarityLandauer & Dumais, 1997
Semantic SpaceLatent Semantic Analysis
Shepard, 1987Probability of generalization decays exponentially with
distance.
Osgood, 1957Factor analysisEvaluative, potency, activity
Dumais et al., 1988SVD, cosine similarityLandauer & Dumais, 1997
Connectionism
Selfridge, 1958Pandemonium
Rumelhart, McClelland, & PDP Research Group, 1986Parallel Distributed Processing, 2 Vol Set
ConnectionismRumelhart & Todd, 1993
Common Ground
PrototypeKmeans
ExemplarK-Nearest Neighbor
Theory-based categoriesCollaborative filtering, decision-tree
Decision boundarySupport Vector Machine
Semantic space / LSAConnectionism - artificial neural net
How Cognitive Psychologists Analyze Data
Task completion rate:
Main effect of coffeeavg(10,8,10,23,18,15) - avg(12,13,10,14,15,12)
Main effect of time-of-dayavg(14,15,12,23,18,15) - avg(12,13,10,10,8,10)
Interaction [avg(23,18,15) - avg(14,15,12)]- [avg(10,8,10) - avg(12,13,10)]
1 Cup 3 Cups
Morning 12,13,10 10,8,10
Evening 14,15,12 23,18,15
Graph It
Main effects and interaction
Rate
Evening
Morning
Cups of Coffee
Training set17770 movies, 500K users, 100M ratings
user_id, movie_id, rating, date_of_ratingmovie_id, title, year
Probe set (1.4M ratings)
Qualifying set (2.8M ratings)user_id, movie_id, date_of_rating
RMSEsqrt( sum(X - X.pred)2 / N )Cinematch: 0.9514
The Netflix Prize Problem, 2006/10/02
0.8563 => $1M
Standard Deviation and RMSE
The Netflix Problem, Interpreted
Overall average movie rating: 3.620*Main effect of movie:
Miss Congeniality: avg(u1,u2,u3...)Mission Impossible: avg(u1,u2,u3...)
Main effect of user:Alex: avg(m1,m2,m3…)Brian: avg(m2,m2,m3…)
Interaction:Alex - Miss Congeniality, Mission Impossible, ...Brian - Miss Congeniality, Mission Impossible, ...
RMSE, Appreciated
Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*
R.pred = M.avg + U.dev
Cinematch: 0.9514...
...
Prize: 0.8563
The Arithmetic Approach
R = M.avg + U.dev + interactioninteraction = R - (M.avg + U.dev)R.pred = M.avg + U.dev + w.avg(interaction * sim(M.p, M))
Alex R M.avg dev interactionMission Impossible 4 4.3 -0.3 4 - [4.3 + (-1.4)] = 1.1Coyote Ugly 1 3.5 -2.5 1 - [3.5 + (-1.4)] = - 1.1Miss Congeniality ? 4.5
Alex U.dev = ((4 - 4.3) + (1 - 3.5)) / 2 = -1.4sim(Miss Congeniality, Coyote Ugly) = 0.8sim(Miss Congeniality, Mission Impossible) = 0.2? = 4.5 + (-1.4) + (-1.1*0.8 + 1.1 * 0.2) / (|0.8| + |0.2|) = 2.44
Similarity Measures
Romesburg, 1984Shape difference vs.Size displacement
Euclidean distanceCosine similarityCorrelation coefficient
Movie Similarity
Similarity measuresCo-occurrence count
How often a person rented both movies.
CorrelationA function of the difference in ratings when a person rented both
movies.
Correlation weighted by probability (significance)Mean Euclidean distance of movie x user interactions
interaction = R - (M.avg + U.dev)
Weighted by similarities inmovie release times, rental frequencies, mean ratings
User Clusters
Differentiate movie mean rating and similarityR.pred = M.cluster_avg + U.cluster_dev + w.avg(interaction * sim_cluster(M,M.p))
By experience (number of movies rated)[2,180], [81,180], [181,240], [240,400], [401,3000]
By genderInferred from preference for different movie clusters
By cluster analysisPCA, Kmeans
Blend It
Generate different sets of predictions using different movie similarity and user cluster strategies
Use linear regression to combine the sets of predictions into one final prediction
Weak learners are good too, as long as they provide unique information.
RMSE, 2008/04/01
Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*
R.pred = M.avg + U.dev
Cinematch: 0.9514...
Naga FX: 0.9063...
Prize: 0.8563
Cognitive theories and data mining methodsPrototype K-MeansExemplar K-Nearest NeighborTheory-based categories Collaborative filtering, decision-treeDecision boundary Support Vector MachineSemantic space / LSAConnectionism - artificial neural net
Abstraction and generalizationIt’s all about similarity.
Tversky, 1977Murphy & Medin, 1985
Looking Back
Top Related