Supervised-Learning Link Recommendation in the DBLP co-authoring network

22
Introduction Link Prediction and Metrics Results Conclusions Supervised Learning Link Recommendation in the DBLP co-authoring network Gabriel P Gimenes, Hugo Gualdron, Thiago R Raddo, Jose F Rodrigues Jr Instituto de Ciˆ encias Matem´ aticas e de Computa¸ ao Universidade de S˜ ao Paulo Av. Trabalhador S˜ ao-carlense, 400-Centro, S˜ ao Carlos, SP, Brasil Click for paper: http://www.icmc.usp.br/pessoas/junio/PublishedPapers/Gimenes_et_al_IEEE-PerCom-SCI2014.pdf This work has financtial support from FAPESP (2013/10026-7 2011/13724-1) 1 / 22

Transcript of Supervised-Learning Link Recommendation in the DBLP co-authoring network

Page 1: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Supervised Learning Link Recommendation in theDBLP co-authoring network

Gabriel P Gimenes, Hugo Gualdron, Thiago R Raddo, Jose FRodrigues Jr

Instituto de Ciencias Matematicas e de ComputacaoUniversidade de Sao Paulo

Av. Trabalhador Sao-carlense, 400-Centro, Sao Carlos, SP, Brasil

Click for paper:http://www.icmc.usp.br/pessoas/junio/PublishedPapers/Gimenes_et_al_IEEE-PerCom-SCI2014.pdf

This work has financtial support from FAPESP (2013/10026-7 2011/13724-1)

1 / 22

Page 2: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Summary

1 Introduction

2 Link Prediction and Metrics

3 Results

4 Conclusions

2 / 22

Page 3: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Context

Advances in the WWW led to improved mechanisms for usersto interact

Data became abundant in several scenarios

social networks, co-authoring networks, recommender systems,communication networks

Need for tools that can assist in the decision making process

Most of the networks produced on our daily lives are dynamic- Link Recommendation

3 / 22

Page 4: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Objectives

Analysis of the Link Recommendation task on a co-authoringnetwork - DBLP

Comparison between the most used algorithms in supervisedlearning using performance metrics (AUC, F-measure,Precision e Recall)

Including the use of meta-classifiers such as Bagging andRandom Forest

Detailed study of the parameters involved on the technique -Core(k) and the intervals

4 / 22

Page 5: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Link Prediction and Metrics

1 Introduction

2 Link Prediction and Metrics

3 Results

4 Conclusions

5 / 22

Page 6: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Problem Definition

It is possible to model a co-authoring network as a graph,nodes represent individuals and edges indicate a collaborationbetween them

The idea is to predict/recommend new edges using only pastand present informations about the network using supervisedlearning techniques

6 / 22

Page 7: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Problem Definition

Applications exist in different domains such as:

Forecasting suspect behavior on social networks, terrorism, forexampleIdentifying interactions that would need intenseexperimentation in biologySuggesting new collaborations/interactions to individuals onco-authoring networks

7 / 22

Page 8: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Problem Definition

Given a snapshot of a network on time t, we are interested inthe edges that most likely should/could exist in t’, wheret < t ′.

Training a supervised classifier using topological featuresextracted from the network to be able to analyze its dynamics

8 / 22

Page 9: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Problem Definition

9 / 22

Page 10: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Core

Core(k) is the subset of nodes of interest

Nodes that have at least k edges on training and test intervalsare considered to be in Core(k), the other nodes are not used

10 / 22

Page 11: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Topological Features

Metric Equation

Common Neighbours CN(x , y) = |Γ(x) ∩ Γ(y)|

Jaccard Coeficient JC(x , y) = |Γ(x)∩Γ(y)||Γ(x)∪Γ(y)|

Preferential Attachment PA(x , y) = |Γ(x)| ∗ |Γ(y)|

Adamic-Adar Coeficient AA(x , y) =∑

z∈Γ(x)∩Γ(y)1

log|Γ(z)|

Geodesic Distance shortest path between x and y

Resource Allocation Index RA(x , y) =∑

z∈Γ(x)∩Γ(y)1

|Γ(z)|

Local Paths LP(x , y) =∣∣∣paths(2)

x,y

∣∣∣+ e ∗∣∣∣paths(3)

x,y

∣∣∣Node Clustering Coeficient ANCC(x , y) = cc(x) + cc(y)

11 / 22

Page 12: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results

1 Introduction

2 Link Prediction and Metrics

3 Results

4 Conclusions

12 / 22

Page 13: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Experiments

Classification instances are node pairs, classified as positive ornegative depending on the existence of an edge between themon the test interval

The metrics presented are used as an array of features

Classifier DetailsJ48 Decision TreeNaive Bayes ProbabilisticMLP Neural NetworkRandom Forest Set of Decision TreesBagging Set of Decision Trees

13 / 22

Page 14: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results

Settings considered:

[1995 − 2005], [2006 − 2007]

[1990 − 1999], [2000 − 2004]

[1995 − 1999], [2000 − 2004]

Using k as 0, 3, 5 and 7 in each case.

14 / 22

Page 15: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results - Interval G [1995, 2005],G [2006, 2007]

k Classificador PRECISION RECALL F-MEASURE AUC

0

J48 0.723 0.706 0.7 0.764NB 0.741 0.585 0.505 0.626

MLP 0.562 0.555 0.541 0.593RF 0.877 0.868 0.867 0.939

Bagging 0.809 0.8 0.798 0.887

1

J48 0.787 0.759 0.753 0.817NB 0.777 0.598 0.52 0.648

MLP 0.628 0.618 0.61 0.639RF 0.914 0.903 0.902 0.977

Bagging 0.84 0.83 0.829 0.913

15 / 22

Page 16: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results - Interval G [1995, 2005],G [2006, 2007]

k Classificador PRECISION RECALL F-MEASURE AUC

3

J48 0.852 0.845 0.844 0.87NB 0.773 0.585 0.499 0.704

MLP 0.715 0.714 0.713 0.735RF 0.917 0.913 0.912 0.974

Bagging 0.846 0.841 0.841 0.925

5

J48 0.827 0.771 0.761 0.79NB 0.778 0.601 0.526 0.727

MLP 0.695 0.679 0.672 0.74RF 0.897 0.888 0.887 0.972

Bagging 0.844 0.83 0.828 0.913

16 / 22

Page 17: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results - Interval G [1995, 2005],G [2006, 2007]

k Classificador PRECISION RECALL F-MEASURE AUC

7

J48 0.861 0.839 0.836 0.867NB 0.786 0.626 0.566 0.741

MLP 0.725 0.719 0.717 0.785RF 0.914 0.908 0.907 0.971

Bagging 0.883 0.866 0.865 0.94

17 / 22

Page 18: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results - Interval G [1995, 2005],G [2006, 2007]

18 / 22

Page 19: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Results

Bagging and Random Forest classifiers outperform every otherclassifier significantly

We belive that the metaclassifiers are better suited for LinkPrediction due to their thickness in dealing with redundantmetrics and bad instances

Also the metaclassifiers can surpass overfitting errors better

The parameters k and the time interval affect the quality ofthe recommendation

19 / 22

Page 20: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Conclusions

1 Introduction

2 Link Prediction and Metrics

3 Results

4 Conclusions

20 / 22

Page 21: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Conclusions

We analyzed the Link Recommendation problem on thesupervised learning context

Compared algorithms using evaluation metrics such as AUC,F-Measure, Precision and Recall

Each experiment was set on a different interval and we run itwith different values of k

The dataset was sensible to long periods of time - strongdynamism of the academic community

In our experiments the neighbourhood cut (core) was alsoimportant to further improve the results

21 / 22

Page 22: Supervised-Learning Link Recommendation in the DBLP co-authoring network

Introduction Link Prediction and Metrics Results Conclusions

Thanks!

Questions?

Click for paper:http://www.icmc.usp.br/pessoas/junio/PublishedPapers/

Gimenes_et_al_IEEE-PerCom-SCI2014.pdf

22 / 22