Download - A cognitive psychologist's approach to data mining

A Cognitive Psychologist's- Approach to Data Mining

- How I beat Netflix Cinematch

Maggie XiongApril 22, 2014

Parallel FrameworksCognitive psychology & data mining

Case StudyThe Netflix Prize Project

General Outline

Abstraction and Generalization

CategorizationPrototypeExemplarDecision boundaryTheory-based categories

Semantic space / LSAConnectionism

Abstraction

Linguistic ideas (Bransford & Franks, 1971)“The ants in the kitchen ate the sweet jelly which was on the table.”“The ants in the kitchen ate the sweet jelly.”“The ants in the kitchen ate the jelly.”“The ants were in the kitchen.”Participants were more confident in “recognizing” fuller sentences.

Prototype (Posner & Keele, 1968)Participants studied instances generated from distortions of prototypes.They showed the same accuracy and response time for never-seen

prototypes and memorized instances in a later test.

Categorization

Category structure (Collins & Quillian, 1969)Economy of organizationParticipants takes longer to respond to statements across category levels.Typicality

Exemplar(Jacoby & Brooks, 1984)

Decision BoundaryTheory-based Categories

Decision boundary (Ashby & Gott, 1988)

Theory-based categories (Murphy & Medin, 1985)Categories organized around theories about the world.clean vs unclean foods; apples and prime numbers

Semantic SpaceLatent Semantic Analysis

Shepard, 1987Probability of generalization decays exponentially with

distance.

Osgood, 1957Factor analysisEvaluative, potency, activity

Dumais et al., 1988SVD, cosine similarityLandauer & Dumais, 1997

Connectionism

Selfridge, 1958Pandemonium

Rumelhart, McClelland, & PDP Research Group, 1986Parallel Distributed Processing, 2 Vol Set

ConnectionismRumelhart & Todd, 1993

Common Ground

PrototypeKmeans

ExemplarK-Nearest Neighbor

Theory-based categoriesCollaborative filtering, decision-tree

Decision boundarySupport Vector Machine

Semantic space / LSAConnectionism - artificial neural net

How Cognitive Psychologists Analyze Data

Task completion rate:

Main effect of coffeeavg(10,8,10,23,18,15) - avg(12,13,10,14,15,12)

Main effect of time-of-dayavg(14,15,12,23,18,15) - avg(12,13,10,10,8,10)

Interaction [avg(23,18,15) - avg(14,15,12)]- [avg(10,8,10) - avg(12,13,10)]

1 Cup 3 Cups

Morning 12,13,10 10,8,10

Evening 14,15,12 23,18,15

Graph It

Main effects and interaction

Rate

Evening

Morning

Cups of Coffee

Training set17770 movies, 500K users, 100M ratings

user_id, movie_id, rating, date_of_ratingmovie_id, title, year

Probe set (1.4M ratings)

Qualifying set (2.8M ratings)user_id, movie_id, date_of_rating

RMSEsqrt( sum(X - X.pred)2 / N )Cinematch: 0.9514

The Netflix Prize Problem, 2006/10/02

0.8563 => $1M

Standard Deviation and RMSE

The Netflix Problem, Interpreted

Overall average movie rating: 3.620*Main effect of movie:

Miss Congeniality: avg(u1,u2,u3...)Mission Impossible: avg(u1,u2,u3...)

Main effect of user:Alex: avg(m1,m2,m3…)Brian: avg(m2,m2,m3…)

Interaction:Alex - Miss Congeniality, Mission Impossible, ...Brian - Miss Congeniality, Mission Impossible, ...

RMSE, Appreciated

Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*

R.pred = M.avg + U.dev

Cinematch: 0.9514...

...

Prize: 0.8563

The Arithmetic Approach

R = M.avg + U.dev + interactioninteraction = R - (M.avg + U.dev)R.pred = M.avg + U.dev + w.avg(interaction * sim(M.p, M))

Alex R M.avg dev interactionMission Impossible 4 4.3 -0.3 4 - [4.3 + (-1.4)] = 1.1Coyote Ugly 1 3.5 -2.5 1 - [3.5 + (-1.4)] = - 1.1Miss Congeniality ? 4.5

Alex U.dev = ((4 - 4.3) + (1 - 3.5)) / 2 = -1.4sim(Miss Congeniality, Coyote Ugly) = 0.8sim(Miss Congeniality, Mission Impossible) = 0.2? = 4.5 + (-1.4) + (-1.1*0.8 + 1.1 * 0.2) / (|0.8| + |0.2|) = 2.44

Similarity Measures

Romesburg, 1984Shape difference vs.Size displacement

Euclidean distanceCosine similarityCorrelation coefficient

Movie Similarity

Similarity measuresCo-occurrence count

How often a person rented both movies.

CorrelationA function of the difference in ratings when a person rented both

movies.

Correlation weighted by probability (significance)Mean Euclidean distance of movie x user interactions

interaction = R - (M.avg + U.dev)

Weighted by similarities inmovie release times, rental frequencies, mean ratings

User Clusters

Differentiate movie mean rating and similarityR.pred = M.cluster_avg + U.cluster_dev + w.avg(interaction * sim_cluster(M,M.p))

By experience (number of movies rated)[2,180], [81,180], [181,240], [240,400], [401,3000]

By genderInferred from preference for different movie clusters

By cluster analysisPCA, Kmeans

Blend It

Generate different sets of predictions using different movie similarity and user cluster strategies

Use linear regression to combine the sets of predictions into one final prediction

Weak learners are good too, as long as they provide unique information.

RMSE, 2008/04/01

Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*

R.pred = M.avg + U.dev

Cinematch: 0.9514...

Naga FX: 0.9063...

Prize: 0.8563

Cognitive theories and data mining methodsPrototype K-MeansExemplar K-Nearest NeighborTheory-based categories Collaborative filtering, decision-treeDecision boundary Support Vector MachineSemantic space / LSAConnectionism - artificial neural net

Abstraction and generalizationIt’s all about similarity.

Tversky, 1977Murphy & Medin, 1985

Looking Back