Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

19
Group Recommendation System for Facebook Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Transcript of Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Page 1: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Group Recommendation System for Facebook

Enkh-Amgalan BaatarjavJedsada ChartreeThiraphat Meesumrarn

University of North Texas

Page 2: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Overview

Evolution of Communication

Online Social Networking (OSN)

Architecture Profile feature Profile Analysis Similarity inference Clustering coefficient Decision tree

Conclusion

Traditional medium of communication Mail, telephone, fax,

E-mail, etc. Key to successful

communication Sharing common

value

Page 3: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Online Social Networking

User-driven content Overwhelming number of groups Finding suitable groups Sharing a common value Improving online social network

Page 4: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Architecture

Profile feature extraction

Classification engine Clustering Building decision

tree Group

recommendation

Page 5: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Profile Feature

Group profile defined by profile features of users Time Zone - Age Gender - Relationship Status Political View - Activities Interest - Music TV shows - Movies Books - Affiliations Note counts - Wall counts Number of Fiends

Page 6: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Profile AnalysisSubtype Size Description

G1 Friends 12 Friends group for one is going abroad

G2 Politic 169 Campaign for running student body

G3 Languages 10 Spanish learners

G4 Beliefs & causes 46 Campaign for homecoming king and queen

G5 Beauty 12 Wearing same pants everyday

G6 Beliefs & causes 41 Friends group

G7 Food & Drink 57 Lovers of Asian food restaurant

G8 Religion/Spirituality 42 Learning about God

G9 Age 22 Friends group

G10 Activities 40 People who play clarinets

G11 Sexuality 319 Against gay marriage

G12 Beliefs & causes 86 Friends group

G13 Sexuality 36 People who thinks fishnet is fetish

G14 Activities 179 People who dislike early morning classes

G15 Politics 195 Group for democrats

G16 Hobbies & Crafts 33 People who enjoys Half-Life (PC game)

G17 Politics 281 Not a Bush fan

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G170%

20%

40%

60%

80%

Hidden 15-19 20-24 25-29 30-36

Perc

enta

ge o

f M

em

bers

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G170%

20%

40%

60%

80%

100%

Male Female

Perc

enta

ge o

f M

em

bers

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G17

0%

20%

40%

60%

Hidden VL Li M C VC A Ln

Groups

Perc

enta

ge o

f M

em

bers

Page 7: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Similarity Inference

Hierarchical clustering Normalizing data [0,

1] Computing distance

matrix to calculate similarity among all pairs of members (a)

Finding average distance between all pairs in given two clusters s and r

N

isrrs xxd

1

2)(

r sn

i

n

jsjri

sr

xxdistnn

srd1 1

),(1

),(

(a)

(b)

Page 8: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Clustering Coefficient

- Ri is the normalized Euclidean distance from the center of member i

- Nk is the normalized number of members within distance k from the center

i

R

R

NC i

jj

ii r

rR

maxarg

M

nN kk

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

Ri

C

RX

Cmax

Page 9: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Decision Tree

Decision tree algorithm, based on binary recursive partitioning

Splitting rules Gini, Twoing, Deviance

Tree optimization Cross-validation (computation intense)

Page 10: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

After Data Cleaning

Fair representation of group profile Groups must have at least 10

members Reduction

Users from 1,580 to 1,023 Group from 17 to 7

Group Size

1 274

2 226

3 159

4 151

5 133

6 67

7 13

Page 11: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Result 1

Data set Training: 75% Testing: 25%

Accuracy calculation 25 fold test

Accuracy 27%

Page 12: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Statistical Analysis: Mean

Page 13: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Statistical Analysis: STD

Page 14: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Adjustment in Feature Selection

Feature score calculation Using group profile: FSGP

Using group closeness: FSGC

Combination of FSGP and FSGC: FSPC

)( gff GPSTDFSGP

Page 15: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

FSGP vs Accuracy

Page 16: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

FSGC vs Accuracy

Page 17: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

FSPC vs Accuracy

Page 18: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Result 2

Feature Score Calculation Accuracy (%)

Group–Profile Feature 24.47

STD of means 25.04

Mean of STDs 21.75

Page 19: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas.

Conclusion

Improving QoS of Online Social Networking Architecture

Hierarchical clustering Threshold value to reduce noise Decision tree

Result poor performance cause Decision tree: decision boundaries || to coord. Data overlapping More work on data cleaning

Feature reduction From 12 to 2