Supervised-Learning Link Recommendation in the DBLP co-authoring network
-
Upload
jose-f-rodrigues-jr -
Category
Data & Analytics
-
view
73 -
download
0
Transcript of Supervised-Learning Link Recommendation in the DBLP co-authoring network
Introduction Link Prediction and Metrics Results Conclusions
Supervised Learning Link Recommendation in theDBLP co-authoring network
Gabriel P Gimenes, Hugo Gualdron, Thiago R Raddo, Jose FRodrigues Jr
Instituto de Ciencias Matematicas e de ComputacaoUniversidade de Sao Paulo
Av. Trabalhador Sao-carlense, 400-Centro, Sao Carlos, SP, Brasil
Click for paper:http://www.icmc.usp.br/pessoas/junio/PublishedPapers/Gimenes_et_al_IEEE-PerCom-SCI2014.pdf
This work has financtial support from FAPESP (2013/10026-7 2011/13724-1)
1 / 22
Introduction Link Prediction and Metrics Results Conclusions
Summary
1 Introduction
2 Link Prediction and Metrics
3 Results
4 Conclusions
2 / 22
Introduction Link Prediction and Metrics Results Conclusions
Context
Advances in the WWW led to improved mechanisms for usersto interact
Data became abundant in several scenarios
social networks, co-authoring networks, recommender systems,communication networks
Need for tools that can assist in the decision making process
Most of the networks produced on our daily lives are dynamic- Link Recommendation
3 / 22
Introduction Link Prediction and Metrics Results Conclusions
Objectives
Analysis of the Link Recommendation task on a co-authoringnetwork - DBLP
Comparison between the most used algorithms in supervisedlearning using performance metrics (AUC, F-measure,Precision e Recall)
Including the use of meta-classifiers such as Bagging andRandom Forest
Detailed study of the parameters involved on the technique -Core(k) and the intervals
4 / 22
Introduction Link Prediction and Metrics Results Conclusions
Link Prediction and Metrics
1 Introduction
2 Link Prediction and Metrics
3 Results
4 Conclusions
5 / 22
Introduction Link Prediction and Metrics Results Conclusions
Problem Definition
It is possible to model a co-authoring network as a graph,nodes represent individuals and edges indicate a collaborationbetween them
The idea is to predict/recommend new edges using only pastand present informations about the network using supervisedlearning techniques
6 / 22
Introduction Link Prediction and Metrics Results Conclusions
Problem Definition
Applications exist in different domains such as:
Forecasting suspect behavior on social networks, terrorism, forexampleIdentifying interactions that would need intenseexperimentation in biologySuggesting new collaborations/interactions to individuals onco-authoring networks
7 / 22
Introduction Link Prediction and Metrics Results Conclusions
Problem Definition
Given a snapshot of a network on time t, we are interested inthe edges that most likely should/could exist in t’, wheret < t ′.
Training a supervised classifier using topological featuresextracted from the network to be able to analyze its dynamics
8 / 22
Introduction Link Prediction and Metrics Results Conclusions
Problem Definition
9 / 22
Introduction Link Prediction and Metrics Results Conclusions
Core
Core(k) is the subset of nodes of interest
Nodes that have at least k edges on training and test intervalsare considered to be in Core(k), the other nodes are not used
10 / 22
Introduction Link Prediction and Metrics Results Conclusions
Topological Features
Metric Equation
Common Neighbours CN(x , y) = |Γ(x) ∩ Γ(y)|
Jaccard Coeficient JC(x , y) = |Γ(x)∩Γ(y)||Γ(x)∪Γ(y)|
Preferential Attachment PA(x , y) = |Γ(x)| ∗ |Γ(y)|
Adamic-Adar Coeficient AA(x , y) =∑
z∈Γ(x)∩Γ(y)1
log|Γ(z)|
Geodesic Distance shortest path between x and y
Resource Allocation Index RA(x , y) =∑
z∈Γ(x)∩Γ(y)1
|Γ(z)|
Local Paths LP(x , y) =∣∣∣paths(2)
x,y
∣∣∣+ e ∗∣∣∣paths(3)
x,y
∣∣∣Node Clustering Coeficient ANCC(x , y) = cc(x) + cc(y)
11 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results
1 Introduction
2 Link Prediction and Metrics
3 Results
4 Conclusions
12 / 22
Introduction Link Prediction and Metrics Results Conclusions
Experiments
Classification instances are node pairs, classified as positive ornegative depending on the existence of an edge between themon the test interval
The metrics presented are used as an array of features
Classifier DetailsJ48 Decision TreeNaive Bayes ProbabilisticMLP Neural NetworkRandom Forest Set of Decision TreesBagging Set of Decision Trees
13 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results
Settings considered:
[1995 − 2005], [2006 − 2007]
[1990 − 1999], [2000 − 2004]
[1995 − 1999], [2000 − 2004]
Using k as 0, 3, 5 and 7 in each case.
14 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results - Interval G [1995, 2005],G [2006, 2007]
k Classificador PRECISION RECALL F-MEASURE AUC
0
J48 0.723 0.706 0.7 0.764NB 0.741 0.585 0.505 0.626
MLP 0.562 0.555 0.541 0.593RF 0.877 0.868 0.867 0.939
Bagging 0.809 0.8 0.798 0.887
1
J48 0.787 0.759 0.753 0.817NB 0.777 0.598 0.52 0.648
MLP 0.628 0.618 0.61 0.639RF 0.914 0.903 0.902 0.977
Bagging 0.84 0.83 0.829 0.913
15 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results - Interval G [1995, 2005],G [2006, 2007]
k Classificador PRECISION RECALL F-MEASURE AUC
3
J48 0.852 0.845 0.844 0.87NB 0.773 0.585 0.499 0.704
MLP 0.715 0.714 0.713 0.735RF 0.917 0.913 0.912 0.974
Bagging 0.846 0.841 0.841 0.925
5
J48 0.827 0.771 0.761 0.79NB 0.778 0.601 0.526 0.727
MLP 0.695 0.679 0.672 0.74RF 0.897 0.888 0.887 0.972
Bagging 0.844 0.83 0.828 0.913
16 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results - Interval G [1995, 2005],G [2006, 2007]
k Classificador PRECISION RECALL F-MEASURE AUC
7
J48 0.861 0.839 0.836 0.867NB 0.786 0.626 0.566 0.741
MLP 0.725 0.719 0.717 0.785RF 0.914 0.908 0.907 0.971
Bagging 0.883 0.866 0.865 0.94
17 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results - Interval G [1995, 2005],G [2006, 2007]
18 / 22
Introduction Link Prediction and Metrics Results Conclusions
Results
Bagging and Random Forest classifiers outperform every otherclassifier significantly
We belive that the metaclassifiers are better suited for LinkPrediction due to their thickness in dealing with redundantmetrics and bad instances
Also the metaclassifiers can surpass overfitting errors better
The parameters k and the time interval affect the quality ofthe recommendation
19 / 22
Introduction Link Prediction and Metrics Results Conclusions
Conclusions
1 Introduction
2 Link Prediction and Metrics
3 Results
4 Conclusions
20 / 22
Introduction Link Prediction and Metrics Results Conclusions
Conclusions
We analyzed the Link Recommendation problem on thesupervised learning context
Compared algorithms using evaluation metrics such as AUC,F-Measure, Precision and Recall
Each experiment was set on a different interval and we run itwith different values of k
The dataset was sensible to long periods of time - strongdynamism of the academic community
In our experiments the neighbourhood cut (core) was alsoimportant to further improve the results
21 / 22
Introduction Link Prediction and Metrics Results Conclusions
Thanks!
Questions?
Click for paper:http://www.icmc.usp.br/pessoas/junio/PublishedPapers/
Gimenes_et_al_IEEE-PerCom-SCI2014.pdf
22 / 22