Strength Lies in Differences

10
“Strength Lies in Differences” – Diversifying Friends for Recommendations through Subspace Clustering Eirini Ntoutsi LMU, Munich [email protected]fi.lmu.de Kostas Stefanidis ICS-FORTH, Heraklion [email protected] Katharina Rausch LMU, Munich [email protected] Hans-Peter Kriegel LMU, Munich [email protected]fi.lmu.de ABSTRACT Nowadays, WWW brings overwhelming variety of choices to consumers. Recommendation systems facilitate the se- lection by issuing recommendations to them. Recommen- dations for users, or groups, are determined by considering users similar to the users in question. Scanning the whole database for locating similar users, though, is expensive. Existing approaches build cluster models by employing full- dimensional clustering to find sets of similar users. As the datasets we deal with are high-dimensional and incomplete, full-dimensional clustering is not the best option. To this end, we explore the fault-tolerant subspace clustering ap- proach. We extend the concept of fault tolerance to density- based subspace clustering, and to speed up our algorithms, we introduce the significance threshold for considering only promising dimensions for subspace extension. Moreover, as we potentially receive a multitude of users from subspace clustering, we propose a weighted ranking approach to re- fine the set of like-minded users. Our experiments on real movie datasets show that the diversification of the similar users that the subspace clustering approaches offer results in better recommendations compared to traditional collabo- rative filtering and full-dimensional clustering approaches. 1. INTRODUCTION With the growing complexity of WWW, users often find themselves overwhelmed by the mass of choices available. Shopping for DVDs, books or clothes online becomes more and more difficult, as the variety of offers increases rapidly and gets unmanageable. To facilitate users in their selec- tion process, recommendation systems provide suggestions on items, which might be interesting for the respective user. In particular, recommendation systems aim at giving recom- mendations to users or groups of users by estimating their item preferences and recommending those items featuring the maximal predicted preference. The prerequisite for de- termining such recommendations is historical information on the users’ interests, e.g., the users’ purchase history. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CIKM’14, November 3–7, 2014, Shanghai, China. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2598-1/14/11 ...$15.00. http://dx.doi.org/10.1145/2661829.2662026. Typically, user recommendations are established by con- sidering users sharing similar preferences as the query user. Scanning the whole database to find such like-minded users, though, is a costly process. More efficient approaches build user models for computing recommendations. For example, [14] applies full-dimensional clustering to organize users into clusters and employs these clusters, instead of a linear scan of the database, for predictions. Full-dimensional clustering is not the best option for the recommendation domain due to the high dimensionality of the data. Typically, there exist hundreds to thousands or millions of items in a recommen- dation application. Feature reduction techniques, like PCA, tackle the high dimensionality problem by reducing the ini- tial high dimensional feature space into a smaller one, and working upon the reduced feature space. Such a global re- duction is not appropriate for cases where different dimen- sions are relevant for different clusters; for example, there might be a group of comedy fans, part of which might be- long to another group of drama fans, but there might be no group of both comedy and drama fans. Due to high dimen- sionality, such cases are actually more common than finding, e.g., users similar with respect to the whole feature space. To deal with the high dimensionality aspect of recommen- dations, we employ subspace clustering [11], that extracts both clusters of users and dimensions, i.e., items, based on which users are grouped together. As users possibly belong to more than one subspace cluster (each cluster defined upon different items), this approach broadens our options for se- lecting like-minded users for a query user. Employing in the recommendation process users that differ qualitatively in terms of the items upon which their selection was made, makes the set of like-minded users more diverse. Traditional subspace clustering methods [11] cannot be applied in our settings as our data are characterized by sparsity and incom- pleteness (users rate only few items, resulting in a lot of miss- ing values). To deal with these issues, recently, the so called fault tolerant subspace clustering has been proposed [7]. We extend the original grid-based fault tolerant subspace clus- tering algorithm and introduce two density-based fault tol- erant approaches that, as we will show, perform better in terms of both quality and efficiency. To speed up the al- gorithms, we introduce the significance threshold, a heuris- tic for finding the most significant dimensions for extending subspace clusters instead of considering all dimensions. Sub- space clustering might result in overlapping clusters; we pro- pose a weighted ranking approach to combine these results and select the most prominent users for recommendations. In brief, our contributions are as follows: 729

Transcript of Strength Lies in Differences

Page 1: Strength Lies in Differences

“Strength Lies in Differences” – Diversifying Friends forRecommendations through Subspace Clustering

Eirini NtoutsiLMU, Munich

[email protected]

Kostas StefanidisICS-FORTH, [email protected]

Katharina RauschLMU, Munich

[email protected] Kriegel

LMU, [email protected]

ABSTRACTNowadays, WWW brings overwhelming variety of choicesto consumers. Recommendation systems facilitate the se-lection by issuing recommendations to them. Recommen-dations for users, or groups, are determined by consideringusers similar to the users in question. Scanning the wholedatabase for locating similar users, though, is expensive.Existing approaches build cluster models by employing full-dimensional clustering to find sets of similar users. As thedatasets we deal with are high-dimensional and incomplete,full-dimensional clustering is not the best option. To thisend, we explore the fault-tolerant subspace clustering ap-proach. We extend the concept of fault tolerance to density-based subspace clustering, and to speed up our algorithms,we introduce the significance threshold for considering onlypromising dimensions for subspace extension. Moreover, aswe potentially receive a multitude of users from subspaceclustering, we propose a weighted ranking approach to re-fine the set of like-minded users. Our experiments on realmovie datasets show that the diversification of the similarusers that the subspace clustering approaches offer resultsin better recommendations compared to traditional collabo-rative filtering and full-dimensional clustering approaches.

1. INTRODUCTIONWith the growing complexity of WWW, users often find

themselves overwhelmed by the mass of choices available.Shopping for DVDs, books or clothes online becomes moreand more difficult, as the variety of offers increases rapidlyand gets unmanageable. To facilitate users in their selec-tion process, recommendation systems provide suggestionson items, which might be interesting for the respective user.In particular, recommendation systems aim at giving recom-mendations to users or groups of users by estimating theiritem preferences and recommending those items featuringthe maximal predicted preference. The prerequisite for de-termining such recommendations is historical information onthe users’ interests, e.g., the users’ purchase history.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’14, November 3–7, 2014, Shanghai, China.Copyright is held by the owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-2598-1/14/11 ...$15.00.http://dx.doi.org/10.1145/2661829.2662026.

Typically, user recommendations are established by con-sidering users sharing similar preferences as the query user.Scanning the whole database to find such like-minded users,though, is a costly process. More efficient approaches builduser models for computing recommendations. For example,[14] applies full-dimensional clustering to organize users intoclusters and employs these clusters, instead of a linear scanof the database, for predictions. Full-dimensional clusteringis not the best option for the recommendation domain dueto the high dimensionality of the data. Typically, there existhundreds to thousands or millions of items in a recommen-dation application. Feature reduction techniques, like PCA,tackle the high dimensionality problem by reducing the ini-tial high dimensional feature space into a smaller one, andworking upon the reduced feature space. Such a global re-duction is not appropriate for cases where different dimen-sions are relevant for different clusters; for example, theremight be a group of comedy fans, part of which might be-long to another group of drama fans, but there might be nogroup of both comedy and drama fans. Due to high dimen-sionality, such cases are actually more common than finding,e.g., users similar with respect to the whole feature space.

To deal with the high dimensionality aspect of recommen-dations, we employ subspace clustering [11], that extractsboth clusters of users and dimensions, i.e., items, based onwhich users are grouped together. As users possibly belongto more than one subspace cluster (each cluster defined upondifferent items), this approach broadens our options for se-lecting like-minded users for a query user. Employing inthe recommendation process users that differ qualitativelyin terms of the items upon which their selection was made,makes the set of like-minded users more diverse. Traditionalsubspace clustering methods [11] cannot be applied in oursettings as our data are characterized by sparsity and incom-pleteness (users rate only few items, resulting in a lot of miss-ing values). To deal with these issues, recently, the so calledfault tolerant subspace clustering has been proposed [7]. Weextend the original grid-based fault tolerant subspace clus-tering algorithm and introduce two density-based fault tol-erant approaches that, as we will show, perform better interms of both quality and efficiency. To speed up the al-gorithms, we introduce the significance threshold, a heuris-tic for finding the most significant dimensions for extendingsubspace clusters instead of considering all dimensions. Sub-space clustering might result in overlapping clusters; we pro-pose a weighted ranking approach to combine these resultsand select the most prominent users for recommendations.

In brief, our contributions are as follows:

729

Page 2: Strength Lies in Differences

• We explore subspace clustering for recommendationsand introduce two density-based fault tolerant subspaceclustering approaches.

• We propose the significance threshold as a criterionto prune non-promising for subspace extension dimen-sions.

• We refine via weighted ranking, the selection of like-minded users that result from the combination of sub-space clusters that a user belongs to.

• We experimentally show that employing fault tolerantsubspace clustering with the weighted ranking of like-minded users results in predictions of higher quality,while the proposed significance threshold reduces theruntime of our algorithms.

The rest of the paper is organized as follows. Section 2presents the basics of recommendations and the limitationsof existing approaches. Section 3 overviews fault tolerantsubspace clustering, while Section 4 extends fault tolerantsubspace clustering from grid-based to density-based, andintroduces the significance threshold for reducing the searchspace. Section 5 describes the weighted ranking approach forexploiting the subspace clustering results for recommenda-tions. Section 6 presents our experimental results. Relatedwork is in Section 7, and conclusions are given in Section 8.

2. BACKGROUND AND LIMITATIONS2.1 Background on Recommendations

Assume a recommendation system, where I is the set ofitems to be rated and U is the set of users in the system. Auser u ∈ U might rate an item i ∈ I with a score rating(u, i)in [0.0, 1.0]; let R be the set of all ratings recorded in thesystem. Typically, the cardinality of the item set I is highand users rate only a few items. The subset of users thatrated an item i ∈ I is denoted by U(i), whereas the subsetof items rated by a user u ∈ U is denoted by I(u).

For the items unrated by the users, recommendation sys-tems estimate a relevance score, denoted as relevance(u, i),u ∈ U , i ∈ I. There are different ways to estimate therelevance score of an item for a user. In the content-basedapproach (e.g., [13]), the estimation of the rating of an itemis based on the ratings that the user has assigned to similaritems, whereas in collaborative filtering systems (e.g., [9]),this rating is predicted using previous ratings of the itemby similar users. In this work, we follow the collaborativefiltering approach. Similar users are located via a similarityfunction simU(u, u′) that evaluates the proximity betweenu, u′ ∈ U by considering their shared dimensions. We use Futo denote the set of the most similar users to u, hereafter,referred to as the friends of u.

Definition 1. Let U be a set of users. The friends Fu,of a user u ∈ U consists of all those users u′ ∈ U which aresimilar to u w.r.t. a similarity function simU(u, u′) and athreshold δ, i.e., Fu = {u′ ∈ U : simU(u, u′) ≥ δ}.

Given a user u and his friends Fu, if u has expressed nopreference for an item i, the relevance of i for u is estimatedas:

relevanceFu(u, i) =

∑u′∈Fu

simU(u, u′)rating(u′, i)∑u′∈Fu

simU(u, u′)

After estimating the relevance scores of all unrated useritems, the top-k rated items are recommended to the user.

Most previous works focus on recommending items to in-dividual users. Recently, group recommendations that make

recommendations to groups of users instead of single users(e.g., [3, 14]), have received considerable attention. Our goalis to test our methods for both user and group recommen-dations. Here, for group recommendations, we follow theapproach of [14]: first, estimate the relevance scores of theunrated items for each user in the group, then, aggregatethese predictions to compute the suggestions for the group.

Definition 2. Let U be a set of users and I be a set ofitems. Given a group of users G, G ⊆ U , the group relevanceof an item i ∈ I for G, such that, ∀u ∈ G, @rating(u, i), is:

relevance(G, i) = Aggru∈G(relevanceFu(u, i))

As in [14], we employ three different designs regardingthe aggregation method Aggr: (i) the least misery design,capturing cases where strong user preferences act as a veto(e.g., do not recommend steakhouses to a group when amember is vegetarian), (ii) the fair design, capturing moredemocratic cases where the majority of the group membersis satisfied, and (iii) the most optimistic design, capturingcases where the more satisfied member of the group acts asthe most influential one (e.g., recommend a movie to a groupwhen a member is highly interested in it and the rest havereasonable satisfaction). In the least misery (resp., mostoptimistic) design, the predicted relevance score of an itemfor the group is equal to the minimum (resp., maximum)relevance score of the item scores of the members of thegroup, while the fair design, that assumes equal importanceamong all group members, returns the average score.

2.2 Limitations of Existing ApproachesOne of the key issues in collaborative filtering approaches

is the identification of the friends set Fu of a user u ∈ U .Below we discuss two approaches towards this direction: (i)the naive approach that scans the whole database of usersto select the most similar ones and the (ii) full dimensionalclustering approach that partitions users into clusters andemploys cluster members for recommendations.

2.2.1 Naive ApproachA straightforward approach for finding the set Fu for a

user u ∈ U , is to compute simU(u, u′), ∀u′ ∈ U , and selectthose with simU(u, u′) ≥ δ, where δ is the similarity thresh-old. Such an approach though would be inefficient in largesystems, since it requires the online computation of the setof friends for each query user. The problem is aggravated incase of group recommendations, where the sequential scanof the database should be performed for each user in thequery group. The execution time increases linearly with thenumber of group members; the larger the query group, theslower the approach.

2.2.2 Full-dimensional Clustering ApproachOne way to overcome the limitations of the naive ap-

proach, is to build some users model and directly employthis model for recommendations. Full-dimensional cluster-ing has been used towards this direction to organize usersinto clusters of similar ones. The pre-computed clusters arethen employed to speed up the recommendation process;the friends of a given user u correspond approximately tothe users that belong to the same cluster as u.

In [14], a bottom-up hierarchical clustering algorithm hasbeen employed to build the users model. The similarity be-tween two clusters is defined in terms of the complete link-age, i.e., as the minimum similarity between any two users ofthese clusters. The algorithm terminates when the similar-ity of the closest pair of clusters violates the user similarity

730

Page 3: Strength Lies in Differences

threshold δ. Thus, the resulted clusters fulfill the similar-ity criterion in Definition 1, i.e., all users within a clusterhave a similarity of at least δ. However, the set of friendsmight be incomplete, i.e., for a user u belonging to a clusterC there might be users with whom u has a similarity of atleast δ but they do not belong to C. The reason is in theclustering process: at each step the two most similar clus-ters are merged, resulting in a larger cluster. The order ofmerging plays an important role on the final clusters setup.Although all members of a cluster will have a similarity ofat least δ, there might be users with similarity greater orequal to δ being assigned to different clusters.

The recommendation time in this case is reduced, sincethe clusters are pre-computed and the set of friends is eas-ily deliverable; the decrease is rapid for group recommen-dations, especially as the group size increases. Concerningquality, the naive approach outperforms the full clusteringapproach, as it can locate the extensive set of friends foreach user whereas in full clustering, the set of friends for auser is its corresponding cluster members. The smaller thecluster is, the more restricted the set of friends for a userwithin the cluster is and therefore, the lower the quality ofrecommendations for this user.

Both naive and full dimensional clustering approaches“suf-fer” from the so called curse of dimensionality since bothoperate in the original high dimensional item space. In highdimensional spaces, the discriminative power of the distancefunctions lowers and moreover, it is more difficult to findsimilar users in the whole (high dimensional) feature space,whereas it is easier to locate such users in a subspace of di-mensions. To deal with these issues, subspace clustering [11]tries to detect both the objects (users in our case) that be-long to a cluster and the dimensions (items in our case) thatdefine this cluster. Therefore, a user might belong to morethan one subspace clusters, each defined upon a differentsubset of items. Employing subspace clustering for recom-mendations serves a three-fold purpose: (i) improves theclustering quality by providing a better partitioning of theusers based on different subsets of items, (ii) expands the setof friends for a user by allowing users to belong to severalclusters, and (iii) diversifies the set of friends as differentfriends might be chosen based on different items.

3. SUBSPACE CLUSTERING AND FAULT-TOLERANT SUBSPACE CLUSTERING

Subspace clustering approaches aim at detecting clustersembedded in subspaces of a high-dimensional dataset [11].Clusters may be comprised of different combinations of di-mensions, while the number of relevant dimensions may varystrongly. To restrict the search space, only axis-parallel sub-spaces are searched through for clusters. A subspace S de-scribes a subset of items, S ⊆ I; |S| is the subspace cardi-nality. A subspace cluster C is then described in terms ofboth its members U ⊆ U and subspace of dimensions S ⊆ Iupon which it is defined as C = (U, S).

The vast majority of subspace clustering algorithms workson complete datasets. However, our data is characterizedby many missing values, since users rate only a few items.Recently, fault tolerant subspace clustering [7] has been pro-posed to handle sparse datasets. The main idea of this ap-proach is that clusters including missing values can still bevalid, as long as the amount of missing values does not havea negative influence on the cluster’s grade of distinction.

To restrict the number of missing values in a subspacecluster, thresholds w.r.t. the number of missing items, thenumber of missing users and their combination, are used.These thresholds are adapted from the original work [7]

to the recommendations domain. Users featuring a miss-ing value for item i ∈ I are included in U?(i) = {u ∈U |rating(u, i) =?}, whereas I?(u) = {i ∈ I|rating(u, i) =?}holds those items having a missing value for user u ∈ U .

User Tolerance: Each user in a subspace cluster must notcontain more than a specific number of missing item ratings.That is, ∀u ∈ U : |I ∩ I?(u)| ≤ εu · |I|, where εu ∈ [0, 1] isthe user tolerance threshold.

Item Tolerance: Each item in a subspace cluster shouldnot contain too many missing values. That is, ∀i ∈ I :|U ∩ U?(i)| ≤ εi · |U |, where εi ∈ [0, 1] is the item tolerancethreshold.

Pattern Tolerance: The total number of missing values ina subspace cluster must not exceed the pattern tolerancethreshold εg ∈ [0, 1]. That is,

∑u∈U |S ∩ I?(u)| ≤ εg · |U | · |S|.

Thus, a cluster C = (U, S) is a valid fault tolerant subspacecluster if the number of missing items per user does notviolate εu, the number of missing users per item does notviolates εi and the total number of missing values is boundedw.r.t. εg.

Bottom-up subspace clustering approaches make use ofthe monotonicity property : if C = (U, S) is a subspace clus-ter, in each subset S′, S′ ⊆ S, there exists a superset ofusers, so that, this set is a subspace cluster as well. Thefault tolerance model defined so far, does not follow themonotonicity property. [7] suggests enclosing cluster approx-imations, which form supersets of the actual clusters, i.e.,they include more users than the actual subspace clustersdo. These approximations follow the monotonicity prop-erty. A subspace cluster can be extended by adding somedimensions up to a dimensionality of value mx. Thus, eachfault tolerant subspace cluster C = (U, S), with |S| ≤ mx,is mx-approximated by a maximal fault tolerant subspacecluster A = (UA, S), established by the thresholds εu =min{εu · mx|S| , 1} and εi = εg = 1. The rationale is to extend

the subspace clusters by some dimensions, in order to createenclosing approximations, which fulfill these thresholds.

4. SUBSPACE CLUSTERING FOR USERRECOMMENDATIONS COMPUTATION

4.1 Grid-based Fault Tolerant Subspace Clus-tering (gridFTSC)

[7] introduces the grid-based fault tolerant subspace clus-tering algorithm FTSC by integrating the fault toleranceconcepts to the grid-based subspace clustering algorithmCLIQUE [2]. As in CLIQUE, the data space is partitionedinto non-overlapping rectangular grid cells by partitioningeach item into g equal-length intervals and intersecting theintervals. Clusters consists of dense cells containing morethan a density threshold minPts points.

In FTSC, users with missing values/ratings might also bepart of the clusters. To this end, an extra interval is allo-cated for each item, where users with missing values for therespective item are placed in. To generate cluster approxi-mations, except for the item intervals with existing values,item intervals for missing values are also considered. Thus,a cluster approximation consists of the users in the respec-tive cluster cells plus the users obtained by considering theintersection with the missing values intervals.

For an efficient generation of cluster approximations, a setof users UA is partitioned according to the amount of miss-ing values per user, i.e., (UA, S) = ([U0

A, U1A, · · · , UxA], S),

where U iA consists of the users with exactly i missing values.To avoid analyzing all possible subspaces, the monotonic-ity of approximations and the fault tolerance thresholds are

731

Page 4: Strength Lies in Differences

exploited (Algorithm 1). To generate the actual clustersfrom the approximations, the approximation list of users istraversed and users are added to the cluster. Users withno missing values (at the top of the list) are first added tothe cluster, whereas those with missing values are graduallyadded if they do not violate the fault tolerance thresholds.

Algorithm 1 Candidate Approximations (gridFTSC) [7]

1: function getCandidate((UA, S), d, I, I?)2: if (firstRun) then3: S′ ← d4: U0

B ← I

5: U1B ← I?

6: else7: S′ ← S ∪ {d}8: for (i from 0 to size(UA)) do

9: existingListi ← I ∩ UiA

10: missingListi ← I? ∩ UiA

11: end for12: U0

B ← existingList0

13: n← size(existingList)14: for (i from 1 to (n− 1)) do

15: UiB ← existingListi ∪missingListi−1

16: end for17: Un

B ← missingListn−1

18: end if19: return (UB , S

′)20: end function

Figure 1 (left) shows an example with the 1-item inter-vals for item/dimension 1. There are 4 dense intervals/cells(red), whereas the users with missing values are allocated totheir own interval (blue)1. The candidate approximation isa list: users with no missing values are stored first (markedwith (a)) followed by users with 1 missing value (markedwith (b)). Figure 1 (right) shows the result when extendingthe candidate approximation based on item 2. For illustra-tive purposes, we choose the interval [0.75,1] for item 2. Theextension of the candidate approximation list would be asfollows: users with values in both items 1 and 2 for this range(marked with (1)), are assigned to the first position of thecandidate approximation list. Users with missing values ineither item 1 or 2 (marked with (2) or (3), resp.), are storedin the second position of the list. The last position of thelist stores the interval marked with (4) for users with miss-ing values in both items 1 and 2. This part can be directlypruned, since it contains only missing values.

Figure 1: Grid-based cluster approximations (g = 4,minPts = 3).

4.2 Density-based Fault Tolerant SubspaceClustering

The quality of grid-based clustering heavily depends onthe positioning of the grid in the data space. Density-based

1For visualization reasons, we assume that missing valuesare represented by 0 in the respective item and the lowerbound for ratings is greater than 0.

approaches are more flexible in detecting arbitrarily shapedclusters. Interestingly, we can adapt the candidate approx-imations construction to density-based clusters, instead ofgrid-cells. The difficulty that emerges when transferring thefault tolerance concept to density-based subspace clusteringis that density-based approaches use a distance function andso, we need to evaluate distances between users, even thoughthey might contain missing values.

We propose two approaches for building candidate ap-proximations for density-based fault tolerant subspace clus-tering: (i) a hybrid approach, called hybridFTSC, that com-bines grid- and density-based ideas and (ii) a pure density-based approach inspired by the subspace clustering algo-rithm SUBCLU [8], called denFTSC.

4.2.1 Hybrid Fault Tolerant Subspace Clustering (hy-bridFTSC)

Instead of splitting the data space into intervals, as ingridFTSC, the hybrid approach determines 1-item density-based clusters for each item/dimension of the dataset. Forthe 1D clustering, we employ DBSCAN [6]. However, asDBSCAN cannot handle missing values, for each item, theusers with missing values on it are “isolated” into a so calledpseudo cluster and DBSCAN is applied on the remainingusers. Thus, for each item, we get one or several density-based clusters with users featuring no missing values anda pseudo-cluster with users having no ratings for the spe-cific item. As in gridFTSC, we extend each of those 1-itemcandidate approximations with additional items to receivebroader subspaces, c.f., getCandidate method (Algorithm 2).

We aim at generating candidate approximation lists sortedaccording to the users’ numbers of missing values. Thus, inthe first run of getCandidate, we assign each 1-item density-based cluster to the first position of the candidate approx-imation list (line 4). In the second position, we save thecorresponding pseudo cluster with users featuring missingvalues for the respective item (line 5). To generate an n-itemcandidate approximation, we combine all the users from thecandidate approximation list, generated in the (n−1)th run,to get a set of users for clustering (line 9-10). When examin-ing this set of users, we filter out users with a missing value initem d and assign them to a pseudo cluster (line 13). After-wards, we call an 1-item DBSCAN (considering item d only)on the remaining users (line 16). This DBSCAN-call mayresult in one or several density-based clusters, which rep-resent new candidates for extension. We discard the noisepoints and generate a new candidate approximation for eachof the clusters. To do this, each resulting cluster is combinedwith all users from the pseudo cluster (line 18) and sortedascendingly according to the users’ numbers of missing val-ues in the current subspace to generate the new candidateapproximation list (line 19-20). The algorithm continuesas the grid-based one. Generally, our focus is on creatingdensity-based grid-cells by executing 1-item DBSCAN-runsin order to extend the subspaces to a higher dimensionality.Since the positioning of these grid-cells is flexible and notstatic, we are able to find broader and tighter grid-cells.

Figure 2 (left) displays the 1-item density-based clustersfor item 1. We consider the candidate approximation, whichincludes the density-based cluster marked by (a), as well asthe pseudo cluster, which contains the users with missingvalues for item 1 (marked by (b)). To extend this candidateapproximation by item 2, i.e., to the subspace spanned byitems 1 and 2, we combine the users of both clusters into oneset. Afterwards, we filter out the users with missing valuesfor item 2 (marked by (2) and (3) in the right part). We callDBSCAN on the remaining users and obtain two clusters,

732

Page 5: Strength Lies in Differences

Algorithm 2 Candidate Approximations (hybridFTSC)

1: require density-based 1-item cluster C from item d, pseudo clus-ter C? including missing values for item d

2: function getCandidate((UA, S), d, C, C?)3: if (firstRun) then4: S′ ← d5: U0

B ← C

6: U1B ← C?

7: add (UB , S′) to candidateApproximations

8: else9: S′ ← S ∪ {d}10: for i from 0 to size(UA) do

11: add UiA to allUsers

12: end for13: missingList← filterMissing(allUsers, d)14: existingList← filterExisting(allUsers, d)15: // dist(): distance function based only on item d16: clusters← DBSCAN(existingList, dist, d)17: for all cluster in clusters do18: users← cluster ∪missingList19: UB ← sortByMissingV alues(users, S′)20: add (UB , S

′) to candidateApproximations21: end for22: end if23: return candidateApproximations24: end function

Figure 2: Hybrid-based cluster approximations (ε =0.04, minPts = 3).

marked as (1a), (1b). User o1 is included in cluster (1a), aswe consider just the second item. This is because it belongsto the ε-neighborhood of some of the cluster users of (1a) initem 1. User o2 belongs to cluster (1b). We create two newcandidate approximations by combining each of the clusterswith both (2) and (3). For each new candidate, we create alist, which includes users ordered according to their numberof missing values within the current subspace. For example,the candidate approximation list based on (1a) holds (1a) atits first position, as it does not contain any missing values,(2) and user o1 at its second position, as they contain onemissing value per user, and (3) at its last position with onlymissing values in the current subspace.

4.2.2 Density-based Fault Tolerant Subspace Clus-tering (denFTSC)

SUBCLU [8] is a density-based bottom-up clustering ap-proach, which is based on DBSCAN [6]. In SUBCLU, thenotions of neighborhood, reachability, connectivity and clus-ter from DBSCAN are related to a specific subspace. TheDBSCAN parameters ε (for defining the neighborhood of apoint) and minPts (for deciding on core points) are inher-ited. SUBCLU starts in 1-item subspaces and applies DB-SCAN to every subspace to generate 1-item clusters. Next,it checks, for each cluster, in a bottom-up way, whether thecluster or part of it still exists in higher-item subspaces.SUBCLU considers each k-item candidate subspace and se-lects those of the remaining k-item subspaces, which share(k - 1) attributes with the former. The algorithm joins themin order to determine (k + 1)-item candidate subspaces.

Figure 3: SUBCLU-based cluster approximations (ε =0.04, minPts = 3).

The SUBCLU-based approach aims at determining density-based candidate approximations by focusing on the completecurrent subspace. Our goal is to transfer the SUBCLU clus-tering paradigm to fault tolerant clustering. Therefore, forcomputing distances for the candidate approximations, weemploy a function, which ignores items with missing values.

As in the hybrid approach, we generate 1-item density-based clusters by applying DBSCAN for each item. Theusers with missing values for the respective item are againassigned to a pseudo cluster. In the first run of getCandidate(Algorithm 3), every 1-item cluster is saved in the first po-sition of the candidate approximation list (line 4), whereasthe pseudo cluster for the item is assigned to the second po-sition (line 5). For the nth run of getCandidate, we mergeall users in the candidate approximation list of the (n−1)thrun (line 9-10). They pose the basis for the DBSCAN-callwith respect to the current candidate’s subspace (line 15).For calculating distances, we use a function that considersthe objects in the current subspace and ignores items withmissing values. Again, we receive one or several clusters anddiscard the noise points. We then create a candidate approx-imation list for each of the resulting clusters by sorting theclustered users according to their numbers of missing valuesin the current subspace (line 17). The rest of the algorithm’sprocessing is similar to the grid-based approach.

Figure 3 (left) depicts the state after the call of an 1-itemDBSCAN on item 1. The algorithm retrieves 4 clusters anda pseudo cluster. To extend the candidate approximationconsisting of (a) and (b) to the two-item subspace (items 1and 2), we merge (a) and (b) at the second run of getCandi-date. The call of DBSCAN on the 2-item subspace results intwo new candidate approximations: (i) the combination of(1a), o1, o2, o3 and (2), and (ii) the combination of (1b), o2,o3, o4 and (2). Users o2 and o3 are assigned to both approx-imations, as we ignore items not exist in both users’ featurevectors. Users included in (2) feature missing values in bothitems 1 and 2 and therefore, it has not been determined yet,whether they belong to one of the clusters, and if so, towhich one. So, they are assigned to both approximations.

Again, the candidate approximation list is generated bybuilding sets of users sorted according to their number ofmissing values in the current subspace.

In contrast to the hybrid approach, the SUBCLU-basedapproach does not assign the user right from cluster (1a)(marked with an arrow) to cluster (1a), because it considersthe 2-item distance, which exceeds the value of ε. As thehybrid approach, however, considers just the 1-item distance(based on item 2), the user is assigned to cluster (1a), as hefeatures a distance below ε to nearest cluster object.

4.2.3 Reducing the search space through the signifi-cance threshold

The runtime of the algorithms, highly depends on the di-mensionality of the datasets since we are looking for clustersin subspaces of the original high dimensional feature space.

733

Page 6: Strength Lies in Differences

Algorithm 3 Candidate Approximations (denFTSC)

1: require density-based 1-item cluster C from item d, pseudo clus-ter C? including missing values in item d

2: function getCandidate((UA, S), d, C, C?)3: if (firstRun) then4: S′ ← d5: U0

B ← C

6: U1B ← C?

7: add (UB , S′) to candidateApproximations

8: else9: S′ ← S ∪ {d}10: for i from 0 to size(UA) do

11: add UiA to allUsers

12: end for13: // dist is a distance function ignoring missing values14: // dist is based on the current subspace S’15: clusters← DBSCAN(allUsers, dist, S′)16: for all cluster in clusters do17: UB ← sortByMissingV alues(cluster, S′)18: add (UB , S

′) to candidateApproximations19: end for20: end if21: return candidateApproximations22: end function

To reduce the runtime, we exploit the fact that we are onlyinterested in extending our candidate subspaces and look-ing for broader clusters, which are highly distinctive andcontain significant information. For example, if most usersrated in the same way a particular item (resulting in one biguser cluster and noisy users for this item), the item does notneed to be considered for extension. Following this ratio-nale, for subspace extension, we consider only those itemswhich contain at least clusterThreshold 1-item clusters fea-turing at least dataThreshold % of the overall number ofusers in the dataset. The value for clusterThreshold shouldbe larger than 1 (i.e., an item should be part of at least two1-item clusters) and approximately half of the amount ofpossible ratings (in order to express significance relativelyto the characteristics of the dataset). The dataThresholdexpresses how much of the population these clusters shouldcover.

5. WEIGHTED RANKING FOR LOCATINGSIGNIFICANT FRIENDS

Through subspace clustering, we possibly receive manylike-minded users for a user u, as u might be a member ofseveral subspace clusters. Combining all users from the clus-ters u belongs to is advantageous, as we gain an extensiveand diverse selection of like-minded people for u, since it isbased on different subsets of items. Thus, we are able to re-flect on different characteristics u might feature to calculatethe most promising recommendations for him/her.

To enhance the quality of recommendations, we refine theset of like-minded users based on their common ratings tou. In particular, we order these users according to their full-dimensional distance (based on their common dimensions)to u. Using full-dimensional distance instead of subspacedistance for user ranking, allows us to capture the overallsimilarity of preferences between users. By employing a sub-space distance function during clustering, we are able to de-tect like-minded users, which might share just part of theirinterests with u. When ranking the like-minded users, how-ever, we are interested in finding the most promising ones,i.e., those that also agree or at least do not disagree toostrongly with u in the rest of their commonly rated items.

A distance based on a higher number of common itemsshould be more significant than a distance based on consid-erably less or just one item, therefore we weight the distances

between u and his/her friends based on the number of theircommon items. Formally, the weighted distance between uand his/her friend v ∈ U is given by:

distweighted(u, v) = 1cu,v

√∑ni=1,i∈Iuv

(ui − vi)2

Iuv is the set of common items between u and v and cu,vis their normalized share items. To compute cu,v, min-maxnormalization is employed:

∀u, v ∈ U : cu,v =itu,v−itmin

itmax−itmin

where itu,v is the number of common ratings between u andv and itmin (itmax) is the minimum (maximum) number ofcommon ratings between u and the like-minded users of u.

Finally, the weighted top friends (shortly, friends) used forrecommendations are those featuring a distance to u, whichis below a weighted distance threshold β. More formally:

Definition 3. Let U be a set of users and Θ = {θ1, . . . , θκ}the subspace clustering model upon U , such that Θ = ∪θi.The friends Fu of a user u ∈ U are the users u′ ∈ U that aremembers of the same clusters θi as u and their weighted dis-tance is below the weighted distance threshold β, i.e., Fu ={u′ ∈ U |∃θi ∈ Θ : u, u′ ∈ θi, distweighted(u, u′) ≤ β}.

6. EXPERIMENTAL EVALUATIONWe evaluate the efficiency and quality of the (i) naive,

(ii) fullClu, (iii) gridFTSC, (iv) hybridFTSC, and (v) den-FTSC approaches using two MovieLens datasets2. The ML-100K dataset contains 100,000 ratings given by 983 users for1,682 movie items. The ML-1M dataset includes 1,000,000ratings of 6,040 users over 3,952 movie items. For effi-ciency, we study how the runtime of the algorithms is af-fected by different parameters. For quality, we use accuracymeasures that directly compare the predicted user ratingswith the actual ones. The Mean Absolute Error (MAE)signifies the average of absolute errors of the predictionscompared to the actual given ratings for a user: MAE =1n

∑ni=1 |predictedi − actuali|. The Root Mean Squared Error

(RMSE) expresses the average of squares of absolute errorsof the prediction compared to the existing rating for a user:

RMSE =√

1n

∑ni=1(predictedi − actuali)2. The smaller the

values, the better the quality of recommendations. As thereare ratings only for single users, for group recommendations,we experiment with different characteristics of query groups,choosing from heterogeneous to homogeneous groups, andreport on the average MAE and RMSE over all group mem-bers. We also study the number and size of generated clus-ters as an indirect measure of quality. Intuitively, when auser belongs to a very small cluster, his friends selection islimited and the recommendations are worse compared to auser that gets recommendations from a larger pool of friends.

Experiments run on a 2.5 GHz Quad-Core i5-2450M ar-chitecture featuring 8.00 GB RAM and a 64-bit operatingsystem. The distance between two users is evaluated as theEuclidean distance over their commonly rated items. Whena specific subspace is considered, the distance relies only onthe items that comprise the subspace.

6.1 Parameter Settings and EfficiencyWe study the execution time of our methods under dif-

ferent settings, and the number and dimensionality of theresulting clusters. We do not report on the naive approachhere; according to [14], it takes 4 times longer than fullClu3.

2http://www.grouplens.org/node/733Although there are more efficient methods for kNN acqui-sition, here we refer to the naive approach that does not useany special index structure, does not produce approximateresults [5], neither it refers to a distributed environment [4].

734

Page 7: Strength Lies in Differences

1

10

100

1.000

10.000

fullClu gridFTSC hybridFTSC denFTSC

runt

ime

(sec

)Setting 1 Setting 2 Setting 3

(a) Runtime

0

500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

fullClu gridFTSC hybridFTSC denFTSC

# cl

uste

rs

Setting 1 Setting 2 Setting 3

(b) # Clusters

Figure 4: ML-100K dataset: Runtime (in logarithmic scale)

and #clusters for different parameter settings (c.f. Table 1).

For fullClu, the smaller the number of clusters, the bet-ter, since this indicates clusters of large cardinality, which iswhat we need for a broad selection of friends. For subspaceclustering, a good clustering features many subspace clus-ters, because each user potentially belongs to several clus-ters, offering a wider and more diverse selection of friends.

6.1.1 Dataset ML-100K

We experimented with different parameter settings (Ta-ble 1). The runtime and number of generated clusters foreach approach are depicted in Figure 4.fullClu: The user similarity threshold delta has a strong im-pact on the algorithm performance. The higher its value,the smaller the number of clusters and the bigger (on aver-age) the clusters. For example, for δ = 0.2, there are 3-15users per cluster, and 3 big clusters containing 27, 38 and96 users, respectively. For δ = 0.7, the range is 3-28 usersand there is one big cluster of 138 users. The execution timealso depends on δ; the lower it is, the longer the algorithmtakes. fullClu has the smaller runtime, however its clus-tering might be problematic for determining users’ friends,since it consists of a small number of clusters with imbal-anced cluster cardinalities. This way, the selection of friendsfor a user of one of the (many) small clusters, is very narrowand therefore, the quality of the recommendations might bepoor. The problem is not so severe for a user of a (usuallyone) big cluster, as his friends selection is more broad.gridFTSC : The lower the density threshold minPts, thelarger the number of clusters and the more higher-dimensionalthe clusters, since more grid cells are considered as dense.This way, the runtime increases with a decreasing valueof minPts, as the number of clusters and the number ofitems to be considered for extension increases. For instance,for minPts = 50, the algorithm detects 494 subspace clus-ters, whose dimensionality lies in [1-4] range. The higher-dimensional subspace clusters are based mostly on the samesubset of items. This implies that the dataset features someprominent items, which“derive”big clusters. When we lowerthe threshold, the number of clusters as well as the runningtime increase drastically, however the maximum subspacedimensionality of the clusters not; the maximum dimension-ality is 5 for both minPts = 40 and minPts = 50. Higher-dimensional subspace clusters are desirable, as they demon-strate a higher agreement in terms of the included userspreferences. Therefore, the choice of minPts is a trade-offbetween algorithm run-time and clustering result quality.Compared to the other approaches, gridFTSC generates thelarger number of clusters and is the slowest method.hybridFTSC : Similarly to gridFTSC, with a decreasing valueof minPts, the number of clusters and the dimensionality ofsubspaces increase. However, due to the significance thresh-old, the runtime is only a fraction of gridFTSC’s runtime,while being able to detect a similar amount of clusters. ForminPts = 40, the dimensionality of the detected clusters isin the [1-3] range, with mostly 1-item clusters. The largenumber of 1-item clusters comes from the comparatively

1

10

100

1.000

fullClu hybridFTSC denFTSC

runt

ime

(min

)

(a) Runtime

0

500

1.000

1.500

2.000

2.500

3.000

fullClu hybridFTSC denFTSC

# cl

uste

rs

(b) # Clusters

Figure 5: ML-1M dataset: Runtime (in logarithmic scale)

and #clusters, for the parameters of Table 2.

high significance threshold used to speed up the algorithm.Nevertheless, the quality of recommendations, as we will seelater, does not suffer from this. The algorithm also findsall the prominent items which have been determined by thegrid-based approach. The dimensionality of the detectedclusters increases for minPts = 30, while the increase inruntime is considerably low. For minPts = 20, the numberof clusters significantly increases but still, the run-time is farbelow that of gridFTSC (5 vs 29 minutes). As the dimen-sionality of subspace clusters ranges in [1-4], the algorithmis able to compete with gridFTSC in terms of clusteringquality at a significantly lower run-time.denFTSC : This approach is superior in both runtime andclustering quality. For all parameter settings, it determineshigh-dimensional subspace clusters, while its runtime is un-equaled. For minPts = 35, the dimensionality of the sub-spaces lie in the [1-5] range. All prominent items retrievedby the previous approaches are detected by the extractedsubspace clusters, but also new items are added to the sub-space cluster definitions. The variety in the items of the sub-space clusters of denFTSC is significantly higher comparedto the other approaches. Without an observable increasein runtime, the algorithm determines a noticeable highernumber of subspace clusters comparing to hybridFTSC. ForminPts = 20, even more clusters were detected at almost 1minute, whereas comparative results by gridFTSC and hy-bridFTSC were achieved in approximately 30 and 6 minutes.

6.1.2 Dataset ML-1M

The parameter settings are depicted in Table 2, whereasthe results are shown in Figure 5.fullClu: Compared to ML-100K, the runtime increases dras-tically. In general, the overall behavior is analogous to theobservations made so far: we receive plenty of small clustersand few clusters that are exceptionally big.gridFTSC : Due to heap space limitations, we were not ableto examine its performance on this dataset.hybridFTSC : The performance deteriorates drastically. Asit differs from denFTSC at the point where users are dividedin two sets, we conclude that this operation is highly expen-sive. The number of discovered subspace clusters is similarto denFTSC. The dimensionality of the resulting subspaceclusters for minPts = 100 and a significance threshold of0.17 ranges between 1 and 3 items.denFTSC: Although there is an increase in runtime com-pared to ML-100K dataset, it is not as drastic as with theother approaches. The performance is stable and superiorto the other approaches in efficiency and effectiveness. ForminPts = 100 and a threshold of 0.15, the algorithm de-termines 2,894 subspace clusters in 23 minutes. The dimen-sionality of the subspace clusters ranges from 1 to 4 items.

Concluding, denFTSC is superior to the other approachesin both datasets, as it exhibit a runtime comparable to full-Clu, while deriving many subspace clusters and the largersubspace variety among the subspace clustering methods.Both gridFTSC and hybridFTSC face major difficulties when

735

Page 8: Strength Lies in Differences

Table 1: ML-100K dataset: Parameter settings and resultsParameter settings Runtime # Clusters Setting ID

fullClu δ = 0.2 39s 585ms 141 1δ = 0.5 41s 115ms 87 2δ = 0.7 41s 237ms 83 3

gridFTSC* minPts = 50, grid = 3 13min 33s 55ms 494 1minPts = 40, grid = 3 29min 53s 364ms 1038 2minPts = 30, grid = 3 2h 3min 41s 655ms 4308 3

hybridFTSC (*)(+) minPts = 40, ε = 0.1 1min 54s 555ms 516 1minPts = 30, ε = 0.1 3min 0s 345ms 754 2minPts = 20, ε = 0.1 5min 51s 895ms 1265 3

denFTSC (*)(+) minPts = 35, ε = 0.1 42s 399ms 667 1minPts = 30, ε = 0.1 48s 586ms 819 2minPts = 20, ε = 0.1 1min 23s 15ms 1349 3

(*) parameters for FTSC: εo = 0.4, εs = 0.3, εg = 0.4, (+) parameters for significance threshold: β = 0.13, c = 2

Table 2: ML-1M dataset: Parameter settings and resultsParameter Settings Run-time #Clusters

fullClu δ = 0.5 3h 22min 57s 869ms 384hybridFTSC (*) minPts = 100, ε = 0.1, d = 0.17, c = 2 15h 43min 50s 50ms 2946denFTSC (*) minPts = 100, ε = 0.1, d = 0.15, c = 2 51min 5s 957s 2894

(*) parameters for FTSC: εo = 0.4, εs = 0.3, εg = 0.4)

0

100

200

300

400

500

600

700

800

900

1.000

naive fullClu gridFTSC hybridFTSC denFTSC

runtime (msec) # friends

(a) Runtime & #friends

0,00

0,05

0,10

0,15

0,20

0,25

0,30

naive fullClu gridFTSC hybridFTSC denFTSC

MAE RMSE

(b) MAE & RMSE

Figure 6: Top-10 user recommendations statistics (ML-

100K dataset, Setting 1 from Table 1).

they come to cluster large datasets. hybridFTSC outper-forms gridFTSC in runtime, while determining a cluster-ing result that is qualitatively comparable. The fullClu ap-proach, in general, is not a good option for determining userclusters, as it produces too many small clusters.

6.2 Quality of User RecommendationsFor examining the qualitative differences of our approaches,

we randomly choose users with different demographics (oc-cupation, age and sex) and a number of ratings equal to theaverage number of ratings per user. We issue the 10 mostpromising recommendations to users. For subspace cluster-ing, we use β = 0.15, and, for the naive approach, we employthe distance threshold of the fullClu.

ML-100K : Next, we present results for a 34 years old ed-ucator with 70 ratings. According to his ratings, he seemsto be interested in Drama, Action, Comedy and Romancemovies (30, 18, 18 and 14 ratings), and not in Documen-tary, Fantasy, Horror or Western movies (0 ratings). Forcalculations, we employed the parameters settings 1 (Ta-ble 1), which give the less promising clustering results forall approaches. fullClu results in the smallest clusters andtherefore, in a limited choice of friends. Subspace cluster-ing approaches also result in the lower number of generatedsubspace clusters and the lower number of considered dimen-sions for these clusters. The results are shown in Figure 6.

The runtime is a great deal lower when employing userclusters, since naive scans the whole database to determinethe friends of the query user. fullClu is the fastest method,however its quality of predictions suffers heavily from thesmall set of friends considered. The set of friends gener-ated by the subspace clustering approaches is larger com-pared to fullClu, but still small compared to the naive ap-

1

10

100

1.000

10.000

naive fullClu hybridFTSC denFTSC

runtime # friends

(a) Runtime & #friends

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

naive fullClu hybridFTSC denFTSC

MAE RMSE

(b) MAE & RMSE

Figure 7: Top-10 user recommendations statistics (ML-1M

dataset, Figure 7 (a) is depicted in logarithmic scale).

proach. MAE and RMSE, however, show that the pre-dictions quality of the subspace clustering approaches in-creases when compared to naive, due to the careful selectionof friends. gridFTSC is the slowest among the fault toler-ant approaches due to the employed significance threshold,while all subspace clustering approaches issue the same sug-gestions, though their ranking might differ.

ML-1M : Here, we present results for a female scientistat the age of 56. She has submitted 148 ratings, prefer-ring movies from the genres Drama, Comedy, Romance andThriller (95, 33, 24, 20 ratings), whereas she does not seemto be interested in Western, Documentary, Sci-Fi, Film Noiror Fantasy movies (1, 1, 2, 2, 2 ratings).

Calculating recommendations on this dataset further am-plifies the effects already observed (Figure 7). The runtimeincreases, except for fullClu which requires approximatelythe same time as before, since the cluster of our user con-tains only 9 users. Naive considers more than half of theoverall users as friends, which is obviously too wide. Thus,both fullClu and naive suffer from a poor selection of friends,as reflected in the corresponding MAE and RMSE scores.On the other hand, hybridFTSC and denFTSC performedquite well. Their runtime is smaller than the one required bynaive, thanks to the significance threshold, while the qualityof recommendations does not suffer from this pruning heuris-tic, as MAE and RMSE show. hybridFTSC and denFTSCagree in all recommended items; there is a slight differencein their rankings. Naive calculated similar recommenda-tions and gives the same top-2 items as the density-basedapproaches. fullClu though, totally disagrees with the otherapproaches in its recommendations. This confirms that thenarrow selection of friends leads to poor recommendations.

736

Page 9: Strength Lies in Differences

1

10

100

1.000

10.000

naive fullClu gridFTSC hybridFTSC denFTSC

avg runtime per user runtime

(a) Runtime

0,00

0,10

0,20

0,30

0,40

0,50

0,60

naive fullClu gridFTSC hybridFTSC denFTSC

avg MAE per user avg RMSE per user

(b) MAE & RMSE

Figure 8: Runtime, avg runtime and avg quality values for a

homogeneous query group (ML-100K dataset, Setting 1 from

Table 1, Figure 8 (a) is depicted in logarithmic scale).

To conclude, fault tolerant subspace clustering approachesovercome the friends selection problems of naive and full-Clu at a reasonable runtime. Although, subspace clusteringleads to a larger selection of friends, the weighted rankingbased on full dimensional distance and number of globallyshared dimensions, refines the selection of friends and leadsto more qualitative recommendations comparing to naiveand fullClu.

6.3 Quality of Group RecommendationsSince there is no ground truth for group recommendations,

for the quality evaluation, we rely on the quality of the indi-vidual group members recommendations. We report resultsfor the fair design for groups with different user demograph-ics. For the naive approach, we used the distance thresholdof fullClu, whereas for subspace clustering, a weighted rank-ing with β = 0.15. The group has 10 members, and we issuethe 10 most promising recommendations in each experiment.

6.3.1 Homogeneous Query GroupWe randomly generate query groups with users that ex-

hibit homogeneity w.r.t. their demographics. In particular,we choose users sharing the same occupation, age range andsex, assuming that due to these similar characteristics theirmovie taste will be also similar.

ML-100K : For clustering, we used the parameters set-ting 1 (Table 1). We choose 10 young male programmerswith age between 28 and 30. Figure 8 displays the totalruntime, the average runtime per user, and the MAE andRMSE scores averaged over all group members. The run-time of the naive approach increases rapidly, since a sequen-tial scan of the database is required for each group memberto detect the set of friends. denFTSC is the best among thesubspace clustering approaches, while the best runtime isachieved by fullClu. Regarding quality, fullClu is the worstand denFTSC the best. All subspace clustering approachesagree in their top-5 recommendations, although their rank-ings slightly differ. denFTSC and hybridFTSC even complywith each other in their top-7 recommendations.

ML-1M : Here, we consider a group with 10 female home-makers aged between 35 and 44. Figure 9 displays the run-time, average runtime and quality scores per user. The ob-servations are similar to the ones for ML-100K. The differ-ence in runtimes is even greater now, due to the dataset size.Naive requires double time compared to denFTSC. fullClu isthe fastest and the one with the worst quality, and denFTSCthe one with the best quality. hybridFTSC and denFTSCagree in 8 out of 10 items; their rankings slightly differ.

In overall, denFTSC offers the best trade-off between run-time and quality. Although fullClu is the fastest, its qualityis the worst, since it depends on the positioning of the groupmembers in clusters; small clusters result into narrow friendssets and poor recommendations. denFTSC offers a wide se-lection of friends due to the different subspace clusters thata group member belongs to. Also, due to the weighted rank-

1

10

100

1.000

10.000

100.000

naive fullClu hybridFTSC denFTSC

avg runtime per user runtime

(a) Runtime

0,00

0,10

0,20

0,30

0,40

0,50

0,60

naive fullClu hybridFTSC denFTSC

avg MAE per user avg RMSE per user

(b) MAE & RMSE

Figure 9: Runtime, avg runtime and avg quality values for

a homogeneous query group (ML-1M dataset, Figure 9 (a) is

depicted in logarithmic scale).

1

10

100

1.000

naive fullClu gridFTSC hybridFTSC denFTSC

avg runtime per user runtime

(a) Runtime

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

0,40

0,45

0,50

naive fullClu gridFTSC hybridFTSC denFTSC

avg MAE per user avg RMSE per user

(b) MAE & RMSE

Figure 10: Runtime, avg runtime and avg quality values

for a heterogeneous query group (ML-100K dataset, Setting

1 from Table 1, Figure 10 (a) is depicted in logarithmic scale).

ing filtering of these friends, the resulting set of friends uponwhich the recommendations are based is highly qualitative.

6.3.2 Heterogeneous Query GroupNext, we examine groups of users that have been ran-

domly selected by considering different demographics.ML-100K : For the clustering approaches, we used the pa-

rameters setting 1 (Table 1). The query group consists of5 women (an executive, an artist, a student, an engineerand a retired lady) and 5 men (a librarian, two students, ascientist and an educator). Figure 10 displays the runtime,the average runtime and quality measures per user. For ef-ficiency, the conclusions drawn so far hold. The worst run-time is that of naive. fullClu is the fastest with the worstquality. In general, subspace clustering offers good qual-ity at acceptable runtimes; the best quality is achieved bydenFTSC. Concerning the actual results, hybridFTSC anddenFTSC agree in 8 out of 10 recommendations and in theirtop-3 items, while fullClu shares on average 6 out of 10 itemswith the subspace clustering approaches.

ML-1M : Here, the females are a grad student, a writer, adoctor, an executive, and a self-employed lady. The malesare a programmer, an engineer, an artist, a tradesman, anda retired man. The results are shown in Figure 11. Again,naive is the slowest and fullClu is the fastest with the low-est quality scores. hybridFTSC and denFTSC finish signif-icantly faster than naive and their quality scores are betterfrom those of naive and fullClu. Therefore, they compromisea good trade-off. Regarding the actual recommendations,hybridFTSC and denFTSC agree in 9 out of 10 recommen-dations, and they even agree in their rankings.

To conclude, the effects we observed for single user rec-ommendations are stronger in case of a group. The runtimeof naive increases rapidly, since the database needs to bescanned for each user in the group. The quality, which de-pends on the quality of the individuals recommendations,relies on the selection of the friends set. Neither a very nar-row friends selection, like in fullClu, nor a very wide one, asin naive, perform well. Our experiments show that a broadpool of diverse friends achieved by subspace clustering anda qualitative selection among them based on weighted rank-ing, offer the best recommendations at a fair runtime.

737

Page 10: Strength Lies in Differences

1

10

100

1.000

10.000

naive fullClu hybridFTSC denFTSC

avg Runtime per user runtime

(a) Runtime

0,00

0,10

0,20

0,30

0,40

0,50

0,60

naive fullClu hybridFTSC denFTSC

avg MAE per user avg RMSE per user

(b) MAE & RMSE

Figure 11: Runtime, avg runtime and avg quality values for

a heterogeneous query group (ML-1M dataset, Figure 11 (a)

is depicted in logarithmic scale).

7. RELATED WORKTypically, recommendation approaches are distinguished

between content-based and collaborative filtering. Content-based approaches recommend items similar to those the userpreviously preferred (e.g., [13]), while collaborative filteringapproaches recommend items that users with similar pref-erences liked (e.g., [9]). Several extensions have been pro-posed, such as time-aware recommendations (e.g., [18, 17])and group recommendations (e.g., [3, 14]). Lately, there arealso approaches on extending database queries with recom-mendations [10, 16].

To facilitate the selection of similar users to a query user,clustering has been employed to pre-partition users into clus-ters of similar users and rely on cluster members for rec-ommendations. For example, [14] employ full-dimensionalclustering; as explained though, full dimensional clusteringis not the best option due to the high dimensionality andsparsity of data. Dimensionality reduction techniques, likePCA, could be applied to reduce dimensionality, howeverclusters existing in subspaces rather than in the original (orreduced) feature space will be missed.

[1] proposes a research paper recommender system thataugments users through a subspace clustering algorithm.However, only binary ratings are considered, and therefore,the problem is simplified, since typically ratings lie in a valuerange with higher values indicating stronger preferences. Onthe contrary, we use full range of ratings. [12] also uses sub-space clustering to improve the diversity of recommenda-tions, the dimensions considered though, are not the itemsbut rather more general information extracted upon theseitems (like movie genres). This way, neither the high di-mensionality of the data nor the missing values problem isconfronted. In our approach, we use subspace clusteringupon item ratings to diversify the resulting set of friendsand a fault-tolerant approach to deal with missing ratings.Initial ideas on employing subspace clustering for recom-mendations appeared in [15]. In this work, we proceed fur-ther by proposing two density based fault tolerant subspaceclustering approaches which, as we show, perform better.Moreover, we introduce the significance threshold pruningto reduce the runtime and the weighted ranking approachto combine subspace friends into a final friends set for recom-mendations. We also provide an extensive experimentationfor both users and groups of users.

8. CONCLUSIONSIn this work, we integrate subspace clustering into the rec-

ommendation process to improve its efficiency and effective-ness. We examine a fault tolerant approach, which relaxesthe cluster model, so that missing ratings can be tolerated.We extend grid-based fault tolerance to the density-basedclustering paradigm and propose the hybridFTSC and den-FTSC approaches. To improve runtime, we introduce thenotion of significance threshold that identifies prominent di-

mensions for subspace cluster extension. To improve thequality of predictions, we combine the wide selection of di-verse friends offered by subspace clustering with a weightedranking based on full-dimensional distances between users.Such a concurrent consideration of both local and global dis-tances (in clustering and raking, respectively) allows for awide selection of friends and a fine selection of the best onesfor computing recommendations. Our experiments showthat fault tolerant subspace clustering approaches outper-form, in terms of runtime and quality, both the naive andthe fullClu approaches.

AcknowledgmentsThis work was partially supported by the European projectDIACHRON (IP, FP7-ICT-2011.4.3, #601043).

9. REFERENCES[1] N. Agarwal, E. Haque, H. Liu, and L. Parsons. Research

paper recommender systems: A subspace clusteringapproach. In WAIM, 2005.

[2] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan.Automatic subspace clustering of high dimensional data fordata mining applications. In SIGMOD, 1998.

[3] S. Amer-Yahia, S. B. Roy, A. Chawla, G. Das, and C. Yu.Group recommendation: Semantics and efficiency. PVLDB,2(1):754–765, 2009.

[4] A. Boutet, A.-M. Kermarrec, D. A. Frey, R. Guerraoui, andA. Jegou. Whatsup: A decentralized instant newsrecommender. In IPDPS, 2013.

[5] W. Dong, M. Charikar, and K. Li. Efficient k-nearestneighbor graph construction for generic similaritymeasures. In WWW, 2011.

[6] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. Adensity-based algorithm for discovering clusters in largespatial databases with noise. In KDD, 1996.

[7] S. Gunnemann, E. Muller, S. Raubach, and T. Seidl.Flexible fault tolerant subspace clustering for data withmissing values. In ICDM, 2011.

[8] K. Kailing, H.-P. Kriegel, and P. Kroger. Density-connectedsubspace clustering for high-dimensional data. In SIAM,2004.

[9] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R.Gordon, and J. Riedl. Grouplens: Applying collaborativefiltering to usenet news. Commun. ACM, 40(3):77–87, 1997.

[10] G. Koutrika, B. Bercovitz, and H. Garcia-Molina. Flexrecs:expressing and combining flexible recommendations. InSIGMOD, 2009.

[11] H.-P. Kriegel, P. Kroger, and A. Zimek. Clusteringhigh-dimensional data: A survey on subspace clustering,pattern-based clustering, and correlation clustering. ACMTrans. Knowl. Discov. Data, 3(1):1:1–1:58, Mar. 2009.

[12] X. Li and T. Murata. Using multidimensional clusteringbased collaborative filtering approach improvingrecommendation diversity. In Web Intelligence/IATWorkshops, 2012.

[13] R. J. Mooney and L. Roy. Content-based bookrecommending using learning for text categorization. InACM DL, 2000.

[14] E. Ntoutsi, K. Stefanidis, K. Nørvag, and H.-P. Kriegel.Fast group recommendations by applying user clustering.In ER, 2012.

[15] K. Rausch, E. Ntoutsi, K. Stefanidis, and H.-P. Kriegel.Exploring subspace clustering for recommendations. InSSDBM, 2014.

[16] K. Stefanidis, M. Drosou, and E. Pitoura. You May AlsoLike results in relational databases. In PersDB, 2009.

[17] K. Stefanidis, E. Ntoutsi, M. Petropoulos, K. Nørvag, andH.-P. Kriegel. A framework for modeling, computing andpresenting time-aware recommendations. T. Large-ScaleData- and Knowledge-Centered Systems, 10:146–172, 2013.

[18] L. Xiang, Q. Yuan, S. Zhao, L. Chen, X. Zhang, Q. Yang,J. Sun, and J. Sun. Temporal recommendation on graphsvia long- and short-term preference fusion. In KDD, 2010.

738