Collating Social Network Profiles. Objective 2 System.

23
Collating Social Network Profiles

Transcript of Collating Social Network Profiles. Objective 2 System.

Page 1: Collating Social Network Profiles. Objective 2 System.

Collating Social Network Profiles

Page 2: Collating Social Network Profiles. Objective 2 System.

2

<Twitter Profile, Facebook Profile, G+ Profile, …>

Objective

<Company Name> System<Twitter Profile, Facebook Profile, G+ Profile, …>

Page 3: Collating Social Network Profiles. Objective 2 System.

3

<Twitter Profile, Facebook Profile, G+ Profile, …>

Objective

Company Name SystemSocial Network

Profiles

Input Output

Page 4: Collating Social Network Profiles. Objective 2 System.

4

Record Linkage+

Identity

Page 5: Collating Social Network Profiles. Objective 2 System.

5

Agenda

Introduction Objective

Contrast to Existing Work

Work Done Baseline System

Individual Network Approach

Machine Learning Experiments

Next Steps, Q&A

Page 6: Collating Social Network Profiles. Objective 2 System.

6

Baseline System

Page 7: Collating Social Network Profiles. Objective 2 System.

7

Ground Truth

Two networks: Facebook and TwitterTop seventy 2013 Fortune 500 companies

Page 8: Collating Social Network Profiles. Objective 2 System.

8

Baseline Algorithm

1.Take company name.

2.Search Facebook/Twitter API using it.

3.Return first result from each.

Page 9: Collating Social Network Profiles. Objective 2 System.

9

Baseline Performance

Facebook Twitter Both0

10

20

30

40

50

60

70

34

52

30

Corr

ect

Matc

hes

Page 10: Collating Social Network Profiles. Objective 2 System.

10

Individual Network Approach

Page 11: Collating Social Network Profiles. Objective 2 System.

11

New Approach

Score profiles based onEdit Distance

Company Name – Username

Company Name – Display Name

Relative Popularity

Page 12: Collating Social Network Profiles. Objective 2 System.

12

Display Name

Username

Page 13: Collating Social Network Profiles. Objective 2 System.

13

New Approach

Score profiles based onEdit Distance

Company Name – Username

Company Name – Display Name

Relative Popularity

Page 14: Collating Social Network Profiles. Objective 2 System.

14

Scoring

Edit Distance Score:

Popularity Score:

Page 15: Collating Social Network Profiles. Objective 2 System.

15

Best Performing Combination

Facebook Twitter Both0

10

20

30

40

50

60

70

34

52

30

40

50

34

Baseline Username Edit Distance + Popularity

Corr

ect

Matc

hes

Page 16: Collating Social Network Profiles. Objective 2 System.

16

Machine Learning Experiments

Page 17: Collating Social Network Profiles. Objective 2 System.

17

Freebase Ground Truth

1,422 with a social media presence

917 with Facebook, 687 with Twitter

598 with both

553 with valid profiles

Page 18: Collating Social Network Profiles. Objective 2 System.

18

Training Set

553 Correct

553 Incorrect

1106

Total

Page 19: Collating Social Network Profiles. Objective 2 System.

19

Cross Validation Results

Classifier Test | Train Train | Test

Linear Regression 0.734 0.707

Gaussian Naïve Bayes 0.972 0.956

Multinomial Naïve Bayes 0.511 0.506

Bernoulli Naïve Bayes 0.720 0.701

Decision Tree 0.954 0.935

Page 20: Collating Social Network Profiles. Objective 2 System.

20

Next Steps

Improve training set: provide harder examples

Page 21: Collating Social Network Profiles. Objective 2 System.

21

Next Steps

Improve training set: provide harder examplesIncorporate more profile data

Page 22: Collating Social Network Profiles. Objective 2 System.

22

Next Steps

Improve training set: provide harder examplesIncorporate more profile dataBuild system around classifiers

Page 23: Collating Social Network Profiles. Objective 2 System.

23

Agenda

Introduction ObjectiveContrast to Existing Work

Work Done Baseline SystemIndividual Network ApproachMachine Learning Experiments

Next Steps, Q&A