CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess.

Post on 18-Jan-2018

219 views 0 download

description

Introduction How Can We Group Friends? How can your friends be grouped logically? What are the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life? How We Define A Clique Desired Results High school friends, family, or co-workers will be grouped together as expected. Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

Transcript of CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess.

CLIQUE FINDERBy Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Outline

Intro Problem Solution

Implementation Distance Algorithm Clustering Algorithm

Validation Test Data set Real Data set

Demo

Introduction

How Can We Group Friends? How can your friends be grouped logically? What are

the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life?

How We Define A Clique Desired Results

High school friends, family, or co-workers will be grouped together as expected.

Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

Implementation

Gather DataDistance AlgorithmClustering Algorithm

Input: Distance Matrix Output: Two dimensional array of friends Test app

Output

Distance Algorithm

ProblemsFacebook limitsServer limits

Retrieving and processing over 30,000 photos can take up to 3-6 minutes

Important informationWhat information should be processed?Used photo tags and wall counts

Data collectedAverage of 8,000 photos across all friends

Distance Algorithm (continued)

Survey of 50 users 5 useful pieces of information

personal information, wall post, photos, groups, and events

Distance Algorithm (continued)

Facebook resultsOne picture with 5 tags = 5 results

Process resultsTurn into a list of friends with tagged photosFind a distance between each friendTurn into a distance matrix

Run time – worse case(number of users)^2*(number of photos)^2

Improved Distance Equation

Dist

ance

Percentage of tagged photos where users appear together

Clustering AlgorithmHierarchical ClusteringAverage Linkage ClustersGeneralized to work on any objects with a

distance functionClustering stops when the closest two clusters

are > threshold distance apart

Point-Based Test Driver

Validation – Sample Data Set

Validation – Sample Data Set How we measured correctness

Thresholds 3-10 gave us the correct number of cliques however, 5 was placed incorrectly

Error rate of 10% because 1/10 users was misplaced

Choose the mid-point value of 6 for our threshold

Validation – Real Data Set

• We chose to use Thomas Dvornik's account– Moderate amount of data– His friends could be separated into well-defined

cliques• Threshold on real data

• Threshold gave highest accuracy at 3 and second highest at 6

Validation – Improvements After improvements

Again, based on our accuracy measurement

Improvements/Future Work

• Caching– The number of queries and computation can

get very large– Store the distance matrix for 24 hours

• Accuracy– Use all aspects of Facebook

• Some activity is not even considered– Using weights for different data sources

• Not all activity is equally important– Analysis of produced cliques

• Survey to see if cliques are accurate

Demohttp://apps.facebook.com/mine_cliques/

Questions?