COSNET: Connecting Heterogeneous Social Networks with...
Transcript of COSNET: Connecting Heterogeneous Social Networks with...
1
COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency
Yutao Zhang+, Jie Tang+, Zhilin Yang+, Jian Pei#, and Philip S. Yu*
+Tsinghua University #Simon Fraser University *University of Illinois at Chicago
2
p Academic Social Network Analysis and Mining system—AMiner (http://aminer.org) p Online since 2006 p >38 million researcher profiles p >100 million publications p >241 million requests p >12.35 Terabyte data p 100K IP access from 170 countries
per month p 10% increase of visits per month
p Deep analysis, mining, and search
AMiner II (ArnetMiner)
3
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
DBLP: Ruud Bolle 2006
Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373EE50
Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524EE49
Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48
2005Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47
Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116EE46
...
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
Knowledge Acquisition from the Web (ACM TKDD, WWW’12, ISWC’06, ICDM’07, ACL’07)
Contact Information
Educational history
Academic services
Publications
1
1 2
2
Ruud Bolle
Position
Affiliation
Address
Address
PhdunivPhdmajor
Phddate
Msuniv
Msdate
Msmajor
BsunivBsdate
Bsmajor
Research Staff
IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA
Brown University
1984
Electrical Engineering
Delft University of Technology
Analog Electronics
1977
Delft University of Technology
IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA
IBM T.J. Watson Research Center
Electrical Engineering
1980
Applied Mathematics
Msmajor
http://researchweb.watson.ibm.com/ecvg/people/bolle.html
Homepage
Ruud BolleName
video database indexingvideo processing
visual human-computer interactionbiometrics applications
Research_Interest
Photo
Publication 1#
Cancelable Biometrics: A Case Study in Fingerprints
ICPR 370
2006
Date
Start_page
Venue
Title
373
End_page
Publication 2#
Fingerprint Representation Using Localized Texture Features
ICPR 521
2006
Date
Start_page
Venue
Title
524
End_page
. . .
Co-authorCo-author
1
Ruud Bolle
2 Publication #3
Publication #5
coauthor
coauthor
UIUC affiliation
Professor position
2
1
4
Researcher Profile Database[1]
Extracted more than 1,000,000 researcher profiles from the Web
[1] J. Tang, L. Yao, D. Zhang, and J. Zhang. A Combination Approach to Web User Profiling. ACM Transactions on Knowledge Discovery from Data (TKDD), (vol. 5 no. 1), Article 2 (December 2010), 44 pages.
5
Is this Enough?
6
Required semantics are distributed in multiple sources
LinkedIn Videolectures
7
Identity Linking • Identifying users from multiple heterogeneous networks and integrating
semantics from the different networks together.
8
COSNET: Connecting Social Networks with Local and Global Consistency
• Input: G={G1, G2, …, Gm}, with Gk=(Vk, Ek, Rk) • Formalization: X={xi}, all possible pairwise
matchings and each corresponds to • COSNET: an energy-based model
[1] Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. KDD’15.
yi ∈{1,0}
Y * = argminE(Y ,X)
9
Local vs. Global consistency
• Given three networks,
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
10
Local vs. Global consistency • Local matching: matching users by profiles
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local consistency
Pairwise similarity features – Username similarity and
uniqueness – Profile content similarity – Ego network similarity – Social status
Energy function
11
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Network matching: matching users’ ego networks
Network matching
Local consistency
Encourage “neighborhood-preserving matching”
12
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Network matching: matching users’ ego networks
Network matching
Local consistency
𝑣11
𝑣12 𝑣22
𝑣21
True True
13
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Network matching: matching users’ ego networks
Network matching
Local consistency
𝑣11
𝑣12 𝑣22
𝑣21
True True
𝑣11
𝑣12 𝑣22
𝑣21
False True x
14
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Network matching: matching users’ ego networks
Network matching
Local consistency
𝑣11
𝑣12 𝑣22
𝑣21
True True
𝑣11
𝑣12 𝑣22
𝑣21
False True x
𝑣11
𝑣12 𝑣22
𝑣21
False False x x
15
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Network matching: matching users’ ego networks
Network matching
Local consistency
𝑣11
𝑣12 𝑣22
𝑣21
True True
𝑣11
𝑣12 𝑣22
𝑣21
False True x
𝑣11
𝑣12 𝑣22
𝑣21
False False x x
16
Network Matching • Network matching: matching users’ ego networks
𝒗𝟏𝟏
𝒗𝟏𝟐 𝒗𝟏𝟑
𝒗𝟐𝟏
𝒗𝟐𝟐 𝒗𝟐𝟑
𝒗𝟏𝟏𝒗𝟐𝟏 𝒗𝟏𝟐𝒗𝟐𝟏
𝒗𝟏𝟏𝒗𝟐𝟐 𝒗𝟏𝟑𝒗𝟐𝟑 𝒗𝟏𝟐𝒗𝟐𝟐
𝒗𝟏𝟐𝒗𝟐𝟑 𝒗𝟏𝟏𝒗𝟐𝟑
𝒗𝟏𝟑𝒗𝟐𝟏
𝒗𝟏𝟑𝒗𝟐𝟐
𝑮𝟏
𝑮𝟐
𝑪
Input networks Matching graph
Energy function
17
Candidate Pruning
• Content-based method – Username similarity above a threshold
• Structure-based similarity – Starting from a seed mapping set and iteratively
propagate the m
18
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
全局一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Global consistency: matching users by avoiding global
inconsistency
Global inconsistency
Network matching
Local consistency
Avoid “global inconsistency”
19
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
全局一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Global consistency: matching users by avoiding global
inconsistency
Global inconsistency
Network matching
Local consistency
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
x
x x
x x
x
20
𝑣13
𝑣11
𝑣31
𝑣12
𝑣22
𝑣21
𝑣33 𝑣23 𝑣32
𝐺 2
局部一致性网络一致性
全局一致性
Username: Ortiz_BrandyNation: USAGender: female
Username: @ortizbrandyNation: USAGender: female
𝐺 1
𝐺 3
Local vs. Global consistency • Global consistency: matching users by avoiding global
inconsistency
Global inconsistency
Network matching
Local consistency
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
𝑣13
𝑣11
𝑣12
x
x x
x x
x
Inconsistent!
21
Avoid Global Inconsistency
Energy function Input networks Matching graph
22
Model Construction
Objective function by combining all the energy functions
𝑣31
𝑣12
𝑣22
𝑣21
𝑣32
𝐺 2
s
𝑣11𝑣12
𝑣11
Matching Graph Generation
Candidate Pruning
Model Construction
𝑣11𝑣22 𝑣11𝑣32
𝑣21𝑣22 𝑣21𝑣12 𝑣21𝑣32
𝑣31𝑣12 𝑣31𝑣22 𝑣31𝑣32 𝑥1 𝑥2 𝑥3
𝑥4 𝑥5
𝑦1
𝑦4 𝑦5
𝑦2 𝑦3
𝑓 𝑙 (𝑥1 ) 𝑓 𝑙 (𝑥2 ) 𝑓 𝑙 (𝑥4 ) 𝑓 𝑙 (𝑥5 )
𝑓 𝑙 (𝑥3 )
𝑓 𝑒 (𝑦1 , 𝑦2 )
𝐺 1
𝑣11𝑣12 𝑣11𝑣32
𝑣21𝑣22
𝑣31𝑣12 𝑣31𝑣32
𝑣11𝑣12 𝑣11𝑣32 𝑣21𝑣22
𝑣31𝑣12 𝑣31𝑣32
𝑓 𝑒 (𝑦2 , 𝑦3 )
𝑓 𝑒 (𝑦2 , 𝑦4 ) 𝑓 𝑒 (𝑦2 , 𝑦5 )
(a) Two input networks (b) The generated matching graph (c) Matching graph after pruning (d) The constructed model
23
Model Learning • Max-margin learning
• As the original problem is intractable, we use Lagrangian relaxation to decompose the original objective function into a set of easy-to-solve sub-problems
24
Model Learning (cont.)
• Dual decomposition
The resulting objective function is convex and non-differentiable, and can be solved by projected sub-gradient method
This provides a lower bound to the original function
25
Results
26
Connecting AMiner with …
• LinkedIn and VideoLectures
Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method;
SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency.
27
Connecting Social Media Sites
• Twitter, LiveJournal, Last.fm, Flickr, MySpace
Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method;
SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency.
28
Effects of Global Consistency
+5.4%
Academia Collection SNS Collection
+9.5%
COSNET-: w/o global consistency.
29
Application in AMiner
- Video contents
- Personal profiles- Business connections- Skills and expertise
- Patents data
30
Thanks!
http://aminer.org
Data & source code http://aminer.org/cosnet