Hybrid RS

A Scalable, Accurate Hybrid Recommender SystemMustansarAliGhazanfarandAdamPrugel-BennettSchoolofElectronicsandComputerScienceUniversityofSouthamptonHigheldCampus,SO171BJ,UnitedKingdomEmail: {mag208r,adb}@ecs.soton.ac.ukAbstractRecommendersystemsapplymachinelearningtechniquesforlteringunseeninformationandcanpredictwhether a user would like a given resource. There are threemain types of recommender systems: collaborative lter-ing,content-basedltering,anddemographicrecommendersystems. Collaborative lteringrecommender systems rec-ommenditems bytakingintoaccount the taste (intermsof preferences of items) of users, under the assumptionthat userswill beinterestedinitemsthat userssimilartothem have rated highly. Content-based ltering recommendersystemsrecommenditemsbasedonthetextualinformationof an item, under the assumption that users will likesimilar items tothe ones they likedbefore. Demographicrecommender systems categorize users or items basedontheir personal attribute andmake recommendationbasedondemographiccategorizations. Thesesystemssufferfromscalability, datasparsity, andcold-start problemsresultingin poor quality recommendations and reduced coverage.Inthis paper, weproposeauniquecascadinghybridrec-ommendationapproachby combining the rating, feature,anddemographicinformationabout items. Weempiricallyshowthat our approachoutperforms the state of the artrecommender systemalgorithms, andeliminates recordedproblemswithrecommendersystems.Keywords-Recommender systems; collaborative ltering;content-basedltering;demographicrecommendersystem.I. INTRODUCTIONTherehasbeenanexponential increaseinthevolumeofavailabledigitalinformation, electronicresources, andon-line services in recent years. This information overloadhas createdapotential problem, whichis howtolterand efciently deliver relevant information to a user.This problem highlights a need for information extractionsystems that can lter unseen information and can predictwhether a user would like a given resource. Such systemsarecalledrecommendersystems.Let M= {m1, m2, , mx }betheset ofall users,N= {n1, n2, , ny }betheset ofall possibleitemsthat can be recommended, and rmi,njbe the rating of usermionitemnj. Let ubeautilityfunctionthatmeasurestheutilityofitemnjtousermi,i.e.u : M N R, (1)whereR is a totally ordered set. Now for each usermi M, theaimof arecommender systemistochoosethatitemn

j Nwhichmaximizestheusersutility[1]. Wecanspecifythisasfollows:n

jmi= argmaxnjNu(mi, nj): miM, (2)wheretheutilityofanitemisapplicationdependent.CollaborativeFiltering(CF) systemscanbeclassiedintotwosub-categories: memory-basedCFandmodel-based CF [1]. Memory-based approaches make a pre-diction by taking into account the entire collection ofprevious rateditems bya user; examples include user-based CF algorithms [2]. Model-based approaches learn amodel fromcollectionofratingsandusethismodel formakingprediction; examples includeitem-basedCF[3]andSingular ValueDecomposition(SVD) basedmodels[4]. Model-based approaches are more scalable than user-basedapproaches.There are two potential problems with the recommendersystems. Oneisthescalability, whichishowquicklyarecommender systemcangeneraterecommendation, andthesecondistoamelioratethequalityoftherecommen-dation for a customer. Pure CF recommender systemsproduces high quality recommendation than those of purecontent-based and demographic recommender systems,however, duetothesparsity1, theycannot ndsimilaritemsorusersusingratingcorrelation, resultinginpoorqualitypredictionsandreducedcoverage2. Furthermore,Allindividualsystemsfailincold-start[1]problems.In this paper, we propose a hybrid scheme (namelyBoostedRDF) which produces accurate and practical rec-ommendation and can be used in cold-start scenarios.Our proposed scheme is based on a cascading hybridrecommendation technique3[5] that build itemmodelsbased on items rating, feature, and demographic informa-tion; and generates more accurate prediction than availablestate of the art recommender algorithms. We evaluate ouralgorithmonMovieLens4andFilmTrust5datasets.II. BACKGROUNDA. Item-Based Collaborative Filtering Recommender Sys-tems:ItemRatingInformationItem-based CF recommender systems [3] build a modelof item similarities using an off-line stage. There are threemainstepsinthesesystemsasfollows:1The percentage of ratings assigned by users is very small as comparedtothepercentageofratingsthesystemhastopredict.2The percentage of the items that can be recommended fromallavailableitemsinthesystem.3Incascadinghybridrecommendersystems,arecommendationtech-niqueisappliedtoproduceacoarsecandidatelistofitemsforrecom-mendation that are rened by applying other recommendation techniques.4www.grouplens.org/node/73.5http://trust.mindswap.org/FilmTrust.2010 Third International Conference on Knowledge Discovery and Data Mining978-0-7695-3923-2/10 $26.00 2010 IEEEDOI 10.1109/WKDD.2010.11794TableIASAMPLEOF FUNCTIONS (F)WITH k= 20Function# Function(f) MAE MAE)(ML) (FT)1 RISim(Item-basedCF) 0.791 1.4422 FISim0.787 1.436............32 FISim +RDSim +DDSim0.736 1.379............83 RISim +FISim +DISimRDSim +FDSim +DDSim0.834 1.4511- Allitemsratedbyanactiveuser6areretrieved.2- Targetitem7ssimilarityiscomputedwiththesetofretrieveditems. Aset of kmost similaritems, alsoknownas neighbours of thetarget itemwiththeirsimilarities areselected.3- Predictionforthetargetitemismadebycomputingthe weighted sum of the active users rating on thekmostsimilaritems.B. Content-Based Recommender Systems: ItemFeatureInformationContent-based recommender systems [1] recommenditems based on the textual information of an item. In thesesystems, an item of interest is dened by its associated fea-tures, for instance, NewsWeeder [8], a newsgroup lteringsystemusesthewordsoftextasfeatures.We downloaded information about movies fromIMDB8, and applied TF IDF[8] approach for extract-ingthefeaturesfromtheinformationabout eachmovie.After stop word removal and stemming, we constructed avector of keywords, tags, directors, actors/actresses, anduser reviews giventoa movie inIMDB. Furthermore,weleverageWordNet usingJavaWordnet Interface9forovercoming the synonym problems between features whilendingthesimilarities among(text)features.C. Demographic Recommender Systems: Item Demo-graphicInformationDemographicrecommendersystems[6],[5]categorizeusers or items based on their personal attributes and makerecommendation based on demographic categorizations. Inour work, we used genre information about a movie as itsdemographic information and constructed a vector as usedin[7].III. EXPERIMENTAL EVALUATIONA. DatasetWe used MovieLens (ML) and FilmTrust (FT) datasetsforevaluatingouralgorithm.MovieLensdatasetcontains943 users, 1682 movies, and 100 000 ratings on an integerscaleof1(bad)to5(excellent). MovieLensdataset has6Theuserforwhomrecommendationaregenerated.7Theitemasystemwantstorecommend.8www.imdb.com.9http://projects.csail.mit.edu/jwi.been used in many research projects [3], [4], [7]. The spar-sity of this dataset is 93.7%

1nonzeroentriesallpossibleentries=11000009431682= 0.937

.we created the second dataset by crawling the FilmTrustwebsite. Thedataset retrieved(on10thof March2009)contains 1592users, 1930movies, and 28 645ratingsonaoatingpoint scaleof 1(bad) to10(excellent). Thesparsityofthisdatasetis 99.06%10.B. MetricsOur specic task in this paper is to predict scoresfor items that alreadyhave beenratedbyactual users,and to check howwell this prediction helps users inselectinghighqualityitems. Keepingthis intoaccount,weuseMeanAbsoluteError(MAE)andROC(ReceiverOperatingCharacteristic)sensitivity.MAEmeasurestheaverageabsolutedeviationbetweena recommender systems predicted rating and a true ratingassignedbytheuser.Itiscomputedasfollows:MAE =N

i=1|rpi rai|N,whererpiandraiarethepredictedandactualvaluesofaratingrespectively, andNisthetotalnumberofitemsthathavebeenrated.Ithasbeenusedin[2],[3].ROCsensitivitymeasurestheprobabilitywithwhichasystemaccept agooditem. TheROCsensitivityrangesfrom 1 (perfect) to 0 (imperfect). For MovieLens dataset,we consider an item good if the user rated it with a scoreof 4 or higher and bad otherwise. Similarly, for FilmTrustdataset, we consider an item good if the user rated it witha score of 7 or higher and bad otherwise. It has been usedin[9].Furthermore, we used coverage that measures howmany items a recommender system can make recommen-dationfor.Ithasbeenusedin[10].IV. PROPOSED ALGORITHMLet ma, nt, R, D, Fbe the active user, target item, user-item rating matrix [1], demographic vector for an item nj,and feature vector for an item njrespectively. Let RISim,DISim, and FISimrepresent the rating, demographic,andfeaturesimilarityamongtheitems. Furthermore, letRDSim, DDSim, andFDSimrepresent the rating similar-ityamongcandidateitemsfoundafter applyingthefea-turecorrelationamongall items, demographicsimilarityamongcandidateitems foundafter applyingthefeaturecorrelation among all items, and feature similarity amongcandidate items found after applying the rating correlationamongallitems.The proposed algorithm can be summarized as follows:Step-1: Computethesimilaritybetweenitemsusingratingdata,demographicdata,andfeaturedata,andstore10Both datasets can be downloaded from:https://sourceforge.net/projects/hybridrecommend.95TableIIASAMPLEOFPARAMETERSETWITH k= 20ParameterSet# MAE MAE(ML) (FT)1 0.1 0.1 0.8 0.739 1.381..................29 0.5 0.3 0.2 0.732 1.378..................35 0.7 0.2 0.1 0.738 1.37336 0.8 0.1 0.1 0.741 1.379thisinformation. Adjustedcosinesimilarity[3]be-tween two items is used for measuring the similarityoverratingdata. Vectorsimilarity[2]betweentwoitems is used for measuring the similarity usingdemographicandfeaturevectors.Step-2: Boosted similarity BoostedSim is dened by a func-tion, fmaxthat combines RISim, DISim, FISim,RDSim, DDSim, andFDSimover set of itemsinthetrainingset. Thisfunctionuses(5)formakingprediction, andtries tomaximize the utility. For-mally,itcanbespeciedasfollows:fmax = argmaxfFu(mi, nj): mi MT, nj NT.(3)Equation(3)tellsustochoosethatfunctionwhichmaximizestheutility(i.e. reducestheMAE)ofallusers (MT) over set of items (NT) inthe train-ingset. TableI givesthedifferent combinationoffunctions checkedover thetrainingset withtheirrespective lowest MAEobserved. It shows that acascading hybrid setting in which rating and demo-graphiccorrelationareappliedover thecandidateneighbour items found after applying the featurecorrelationgivestheminimumerror.Let Cnt={ c1, c2, , ck} be the set of kcandidateneighboursfoundafterapplyingthefea-ture similarity. We dene the boosted similarityBoostedSimby a linear combination of FISim,RDSim, andDDSimover theset of items inthetrainingsetasfollows:BoostedSim(nt, ci) = FISim + RDSim+ DDSim, (4)where, , , and parameters represent the relativeimpactofthreesimilarities. Weassume + + = 1withoutthelossofgenerality.Step-3: PredictingtheratingPma,ntforanactiveusermaontarget itemntismadebyusingthefollowingformula:Pma,nt =k

i=1

BoostedSim(nt, ci) rma,ci

k

i=1

|BoostedSim(nt, ci)|

.(5)10 20 30 40 50 60 70 80 90 1000.770.780.790.80.810.820.830.84Neighbourhood Size (k)Mean Absolute Error FISimRDSimDDSimFigure 1. Determiningthe optimal value of neighbourhoodsize, k(MovieLens).0 20 40 60 80 100 1200.720.740.760.780.80.820.840.860.880.9Neighbourhood Size (k)Mean Absolute Error Userbased CF with DVItembased CFIDemo4Naive BayesNaive hybridPazzaniPersonality diagnosisBoostedRDFFigure 2. MAE of the proposed algorithm with others, against variousneighbourhood sizes(MovieLens).Equation (5) is the same as used by [3] excepttheratingsimilarityfunctionhasbeenreplacedbyBoostedSim.V. RESULTAND DISCUSSIONWe randomly selected 20%ratings of each user asthetest set andusedtheremaining80%astrainingset.We further subdivided our training set into a test setandtrainingsetformeasuringtheparameterssensitivity.For learningtheparameters, weconducted5-foldcrossvalidation on the 80% training set, by randomly selectingthe different test and training set each time, and taking theaverageofresults.We comparedour algorithmwithsevendifferent al-gorithms: user-basedCFusingPearsoncorrelationwithdefault voting(DV) [2], item-basedCFusingadjusted-cosine similarity [3], a hybrid recommendation algorithm,IDemo4, proposed in [7], a Naive Bayes classicationapproachusingitemfeaturesinformation, anaivehybridapproachforgeneratingrecommendation11, thepersonal-itydiagnosisalgorithm[11]formakingprobabilisticrec-ommendations, andahybridrecommendationalgorithmused by Pazanni [6]. Furthermore, we tuned all algorithmsforthebestmentioningparameters.11We take average of the prediction generated by a pure content-basedandapure user-basedCF.96TableIIIACOMPARISONOFTHEPROPOSEDALGORITHMWITHEXISTINGINTERMSOFCOST (BASEDON [4], [11]),ACCURACYMETRICS,ANDCOVERAGEAlgorithm On-lineCost BestMAE BestMAE ROCSensitivity ROCSensitivity Coverage Coverage(ML) (FT) (ML) (FT) (ML) (FT)User-basedwithDV NM20.791 1.441 0.401 0.643 99.424 93.611Item-based N20.789 1.439 0.383 0.621 99.221 92.312IDemo4 N20.768 1.415 0.430 0.644 99.572 94.441BoostedRDFN20.725 1.362 0.562 0.756 100 99.195NaiveBayes M(NP) 0.831 1.470 0.622 0.835 100 99.997Naivehybrid NM2+M(NP) 0.822 1.462 0.526 0.726 100 99.991Pazzani NPM20.793 1.440 0.552 0.739 99.920 99.031Personalitydiagnosis NM 0.785 1.432 0.521 0.732 99.142 94.232A. Locating the Optimal Value of Neighbourhood Size (k)We varied the number of neighbours for an active user,from 0to 100andcomputedthecorrespondingMAEforFISim, RDSim, andDDSim. Theresults areshowninFig.1.Fig. 1 shows that for MovieLens datasetMAEis mini-mum fork = 2012. We choose the neighbourhood size tobe 20forfurtherexperiments.B. LearningtheOptimalValuesofParameters(, , )The 36 parameter sets were generated by producing allpossiblecombinationofparametersvalues, rangingfrom0.1to1.0withdifferences of 0.1. Table II presents asample of the parameter sets learned. The parameters sets = 0.5, = 0.3, = 0.2,and = 0.7, = 0.2, = 0.1gave the lowest MAE in the case of MovieLens andFilmtTrust dataset respectively. It is worth noting thatcombined similarity depends heavily on feature similarity,i.e. . Furthermore, thevalues of parameters arefounddifferent forMovieLensandFilmTrust dataset, whichisduetothefact that bothdataset havedifferent density,ratingdistribution,andratingscale.C. ComparisonoftheProposedAlgorithmwithOthers1) Performance Evaluation in Terms of MAE: Fig.2 shows that our algorithmsignicantly outperformsother algorithms13. Similar results were observed inthe case of FilmTrust dataset. We can conclude fromthe results that clusters of items found after applyingFISim, RDSim, DDSim have complementary role for pre-dictiongeneration.2) Comparison of MAE, ROCSensitivity, Coverage,and On-line Cost of the Proposed Algorithm with Others:TableIII shows theon-linecost (intheworst case) ofeach algorithm, with the corresponding lowest MAE, ROCsensitivity, and coverage. Here, Pis the number of featuresagainst a training example (i.e. features against a movie). Itis worth noting that for FilmTrust dataset, ROC sensitivityis higher, for all algorithms in general, as compared to theMovieLens dataset. We believe that it is due to the ratingdistribution14. Furthermore, the coverage of the algorithms12Thesizefoundtobebetween20 30forFilmTrustdataset.13In Fig. 2, the x-axis represents the number of neighbouring items inthecaseof item-basedCF, IDemo4, andBoostedDemoFeature; andthenumberofneighbouringusersotherwise.14InFilmTrust, majorityof theusers haveratedthepopular set ofmovies and their rating tends to match the average rating of the movies.TableIVPERFORMANCE EVALUATION UNDER NEW ITEM COLD-STARTPROBLEMAlgorithmMAE1 MAE2 MAE5(ML) (FT) (ML) (FT) (ML) (FT)User-basedCFwithDV 1.64 2.67 1.23 2.22 0.95 1.94Item-basedCF 1.34 2.51 1.18 2.13 0.91 1.58IDemo4 1.25 2.34 1.15 2.09 0.89 1.53BoostedRDF0.98 1.62 0.85 1.57 0.82 1.42ismuchlowerinthecaseofFilmTrustdataset, whichisduetothereasonthat it isverysparse(99%). Thetabledepicts that BoostedDemoFeature is scalable and practicalas its on-linecost is less or equal tothecost of otheralgorithms.3) PerformanceEvaluationUnderNewItemandNewUser Cold-Start Problems: When a new item is added tothe system, then it is not possible to get rating for that itemfrom signicant number of users, and consequently the CFrecommender system would not be able to recommend thatitem. This problem is called new itemcold-startproblem[1]. For testing our algorithm in this scenario, we selected1000 random samples of user/item pairs from the test set.While makingpredictionfor a target item, the numberof users in the training set who have rated the targetitemwere kept 1, 2, and5. The corresponding MAE;represented by MAE1, MAE2, MAE5, is shown in the tableIV. TableIVshowsthatproposedschemeworkswellinnew item cold-start problem scenario, as it does not solelydepend on the number of users who have rated the targetitemforndingthesimilarity.Fornewusercold-startproblem[2], wheretheproleof a user is incomplete, we use a linear regression modelfor ndinganapproximationof theactiveusers ratingfor an item. We use this rating instead of the active usersactualratingin(5)forpredictiongeneration.r

ma,ni =rma,ni : ifactiveuserratedmorethanJmoviesrreg: otherwise.(6)In 6, the choice ofJcame from the training set, whichfound to be 10 for MovieLens and 5 for FilmTrust dataset.Ratingrregisfoundbyalinearregressionmodel: Rs =1Rt+2, where Rs, and Rt are the vector of similar itemand vector of target item respectively15. Parameters 1 and15Thesevectoraremadeupofall userwhohaveratedthat iteminthetrainingset.Formoreinformation,referto[3].9799 99.1 99.2 99.3 99.4 99.5 99.6 99.7 99.81.41.61.822.22.42.62.833.2Sparsity (%)Mean Absolute Error UserbasedUserbased CF (DV)Itembased CFIDemo4Naive hybridBoostedRDFFigure3. Performanceof algorithms under different sparsitylevels(FilmTrust).2can found by both rating vectors. This model was usedin [3] for overcoming the misleading similarities betweenitemsintheitem-basedCF. Weshowthismodel makessense to apply only when we have incomplete user prole.The performance of the test set in the case of Given 2 andGiven516[2]protocolisgiveninthetableV.Table V shows that our algorithm with regression gavemore accurate results than others. User-based CF with DVgavegoodresults, asit assumessomedefault votesforitemsauserhasnotvotedon.4) Performance Evaluation Under Different SparsityLevels: Tochecktheeffectofsparsity, weincreasedthesparsitylevel of thetrainingset bydroppingsomeran-domly selected entries. Whereas, we kept the test set samefor each sparse training set. We checked the performanceoftheproposedalgorithmwiththoseofpureuser-basedCF, user-based CF with DV, item-based CF, IDemo4, anda naive hybrid recommendation algorithm. Fig. 317showsthat performancedoes not degrades rapidlyinthecaseofproposedalgorithm. It isbecause; featuresofanitemcanstill beusedfor ndingsimilar items. Furthermore,synonymdetectionalgorithmenrichitemproles whilendingthesimilarity betweentheitems.VI. CONCLUSIONIn this paper, we have proposed a cascading hybridrecommendation approach by combining the feature corre-lation with rating and demographic information of an item.Weshowedempiricallythat our approachoutperformedthestateoftheartalgorithms.ACKNOWLEDGMENTThe work reported in this paper has formed part of theInstant Knowledge Research Programme of Mobile VCE,(theVirtual Centreof ExcellenceinMobile&PersonalCommunications), www.mobilevce.com. Theprogramme16GivenNmeans,activeuserhasratedNitems.17ResultswerethesameforMovieLensdataset.WeshowresultsforFilmTrust toanalysetheperformanceofthealgorithmsunderextremesparsity(i.e.sparsity> 99%).TableVPERFORMANCEEVALUATIONUNDERNEWUSERCOLD-STARTPROBLEMAlgorithmMAE:Given2 MAE:Given5(ML) (FT) (ML) (FT)User-basedCFwithDV 1.06 2.03 0.86 1.48Item-basedCF 1.09 2.06 0.88 1.51IDemo4 1.07 2.05 0.85 1.48BoostedRDF1.06 2.01 0.83 1.47BoostedRDFReg1.01 1.93 0.79 1.41is co-funded by the UKTechnology Strategy BoardsCollaborative Research and Development programme. De-tailed technical reports on this research are available to allIndustrial Membersof MobileVCE. Thanks MartinforProvidingthedataset.REFERENCES[1] A. T. Gediminas Adomavicius, Toward the next generationof recommender systems: Asurveyof thestate-of-the-artand possible extensions, IEEE Transactions on KnowledgeandDataEngineering,vol.17,2005,pp.734749.[2] J. S. Breese, D. Heckerman, and C. Kadie, Empiricalanalysis of predictive algorithms for collaborative ltering,MorganKaufmann,1998,pp.4352.[3] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, Item-based collaborative ltering recommendation algorithms, inProceedingsofthe10thinternational conferenceonWorldWide Web. ACM New York, NY, USA, 2001, pp. 285295.[4] M. Vozalis and K. Margaritis, Using SVD and demographicdatafor theenhancement of generalizedcollaborativel-tering,InformationSciences, vol. 177, no. 15, 2007, pp.30173037.[5] R. Burke, Hybrid recommender systems: Survey and exper-iments, UserModelingandUser-AdaptedInteraction,vol.12,no.4,2002,pp.331370.[6] M. J. Pazzani, A framework for collaborative, content-basedanddemographic ltering, Articial Intelligence Review,vol.13,no.56,1999,pp.393408.[7] M. Vozalis and K. Margaritis, On the enhancement of col-laborativelteringbydemographicdata,WebIntelligenceandAgentSystems,vol.4,no.2,2006,pp.117138.[8] K. Lang, Newsweeder: Learning to lter netnews, inProceedings of the Twelfth International Conference onMachineLearning,1995.[9] P. Melville, R. J. Mooney, and R. Nagarajan, Content-boostedcollaborative lteringfor improvedrecommenda-tions,inEighteenthNational ConferenceonArticial In-telligence,2002,pp.187192.[10] L. G. T. Jonathan L. Herlocker, Joseph A. Konstan and J. T.Riedl,Evaluatingcollaborativelteringrecommendersys-tems,ACMTransactionsonInformationSystems(TOIS)archive,vol.22,2004,pp.734749.[11] D. Pennock, E. Horvitz, S. Lawrence, and C. Giles,Collaborativelteringbypersonalitydiagnosis: Ahybridmemoryandmodel-basedapproach,inProceedingsofthe16thConference onUncertaintyinArticial Intelligence,2000,pp.473480.98

Hybrid RS

Documents

Transcript of Hybrid RS