Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos...
-
Upload
carmella-jordan -
Category
Documents
-
view
219 -
download
0
Transcript of Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos...
![Page 1: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/1.jpg)
Tools for large graph miningWWW 2008 tutorial
Part 4: Case studies
Jure Leskovec and Christos Faloutsos Machine Learning Department
Joint work with: Lada Adamic, Deepay Chakrabarti, Natalie Glance, Carlos Guestrin, Bernardo Huberman, Jon Kleinberg, Andreas Krause, Mary McGlohon,
Ajit Singh, and Jeanne VanBriesen.
![Page 2: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/2.jpg)
Tutorial outline
Part 1: Structure and models for networks What are properties of large graphs? How do we model them?
Part 2: Dynamics of networks Diffusion and cascading behavior How do viruses and information propagate?
Part 3: Case studies 240 million MSN instant messenger network Graph projections: how does the web look like
Leskovec&Faloutsos, WWW 2008 Part 4-2
![Page 3: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/3.jpg)
Part 3: Outline
Case studies– Co-clustering– Microsoft Instant Messenger Communication
network• How does the world communicate
– Web projections• How to do learning from contextual subgraphs
– Finding fraudsters on eBay– Center piece subgraphs
• How to find best path between the query nodes
Leskovec&Faloutsos, WWW 2008 Part 4-3
![Page 4: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/4.jpg)
Leskovec&Faloutsos, WWW 2008 2-4
Co-clustering
Given data matrix and the number of row and column groups k and l
Simultaneously Cluster rows of p(X, Y) into k disjoint groups Cluster columns of p(X, Y) into l disjoint groups
![Page 5: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/5.jpg)
Leskovec&Faloutsos, WWW 2008 2-5
Co-clustering
Let X and Y be discrete random variables X and Y take values in {1, 2, …, m} and {1, 2, …, n} p(X, Y) denotes the joint probability distribution—if not
known, it is often estimated based on co-occurrence data Application areas: text mining, market-basket analysis,
analysis of browsing behavior, etc. Key Obstacles in Clustering Contingency Tables
High Dimensionality, Sparsity, Noise Need for robust and scalable algorithms
Reference:1. Dhillon et al. Information-Theoretic Co-clustering, KDD’03
![Page 6: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/6.jpg)
Leskovec&Faloutsos, WWW 2008 2-6
04.04.004.04.04.
04.04.04.004.04.
05.05.05.000
05.05.05.000
00005.05.05.
00005.05.05.
036.036.028.028.036.036.
036.036.028.028036.036.
054.054.042.000
054.054.042.000
000042.054.054.
000042.054.054.
5.00
5.00
05.0
05.0
005.
005.
2.2.
3.0
03. 36.36.28.000
00028.36.36.
m
m
n
nl
k
k
l
eg, terms x documents
![Page 7: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/7.jpg)
Leskovec&Faloutsos, WWW 2008 2-7
04.04.004.04.04.
04.04.04.004.04.
05.05.05.000
05.05.05.000
00005.05.05.
00005.05.05.
036.036.028.028.036.036.
036.036.028.028036.036.
054.054.042.000
054.054.042.000
000042.054.054.
000042.054.054.
5.00
5.00
05.0
05.0
005.
005.
2.2.
3.0
03. 36.36.28.000
00028.36.36.
term xterm-group
doc xdoc group
term group xdoc. group
med. terms
cs terms
common terms
med. doccs doc
![Page 8: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/8.jpg)
Leskovec&Faloutsos, WWW 2008 2-8
Co-clustering
Observations uses KL divergence, instead of L2 the middle matrix is not diagonal
we’ll see that again in the Tucker tensor decomposition
![Page 9: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/9.jpg)
Leskovec&Faloutsos, WWW 2008 2-9
Problem with Information Theoretic Co-clustering
Number of row and column groups must be specified
Desiderata:
Simultaneously discover row and column groups
Fully Automatic: No “magic numbers”
Scalable to large graphs
![Page 10: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/10.jpg)
Leskovec&Faloutsos, WWW 2008 2-10
Cross-association
Desiderata:
Simultaneously discover row and column groups
Fully Automatic: No “magic numbers”
Scalable to large matrices
Reference:1. Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04
![Page 11: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/11.jpg)
Leskovec&Faloutsos, WWW 2008 2-11
What makes a cross-association “good”?
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Why is this better?
![Page 12: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/12.jpg)
Leskovec&Faloutsos, WWW 2008 2-12
What makes a cross-association “good”?
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Why is this better?
simpler; easier to describeeasier to compress!
![Page 13: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/13.jpg)
Leskovec&Faloutsos, WWW 2008 2-13
What makes a cross-association “good”?
Problem definition: given an encoding scheme• decide on the # of col. and row groups k and l• and reorder rows and columns,• to achieve best compression
![Page 14: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/14.jpg)
Leskovec&Faloutsos, WWW 2008 2-14
Main Idea
sizei * H(xi) +Cost of describing cross-associations
Code Cost Description Cost
Σi Total Encoding Cost =
Good Compression
Better Clustering
Minimize the total cost (# bits)
for lossless compression
details
![Page 15: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/15.jpg)
Leskovec&Faloutsos, WWW 2008 2-15
Algorithmk = 5 row
groups
k=1, l=2
k=2, l=2
k=2, l=3
k=3, l=3
k=3, l=4
k=4, l=4
k=4, l=5
l = 5 col groups
![Page 16: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/16.jpg)
Leskovec&Faloutsos, WWW 2008 2-16
Algorithm
Code for cross-associations (matlab):www.cs.cmu.edu
/~deepay/mywww/software/CrossAssociations-01-27-2005.tgz
Variations and extensions: ‘Autopart’ [Chakrabarti, PKDD’04] www.cs.cmu.edu/~deepay
![Page 17: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/17.jpg)
Microsoft Instant Messenger Communication Network
How does the whole world communicate?
Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network, WWW 2008
![Page 18: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/18.jpg)
The Largest Social Network
Leskovec&Faloutsos, WWW 2008 Part 4-18
![Page 19: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/19.jpg)
Instant Messaging
Leskovec&Faloutsos, WWW 2008
• Contact (buddy) list• Messaging window
Part 4-19
![Page 20: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/20.jpg)
IM – Phenomena at planetary scale
Observe social phenomena at planetary scale: How does communication change with user
demographics (distance, age, sex)? How does geography affect communication? What is the structure of the communication
network?
Leskovec&Faloutsos, WWW 2008 Part 4-20
![Page 21: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/21.jpg)
Communication data
The record of communication Presence data
user status events (login, status change) Communication data
who talks to whom Demographics data
user age, sex, location
Leskovec&Faloutsos, WWW 2008 Part 4-21
![Page 22: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/22.jpg)
Data description: Presence
Events: Login, Logout Is this first ever login Add/Remove/Block buddy Add unregistered buddy (invite new user) Change of status (busy, away, BRB, Idle,…)
For each event: User Id Time
Leskovec&Faloutsos, WWW 2008 Part 4-22
![Page 23: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/23.jpg)
Data description: Communication
For every conversation (session) we have a list of users who participated in the conversation
There can be multiple people per conversation For each conversation and each user:
User Id Time Joined Time Left Number of Messages Sent Number of Messages Received
Leskovec&Faloutsos, WWW 2008 Part 4-23
![Page 24: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/24.jpg)
Data description: Demographics
For every user (self reported): Age Gender Location (Country, ZIP) Language IP address (we can do reverse geo IP lookup)
Leskovec&Faloutsos, WWW 2008 Part 4-24
![Page 25: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/25.jpg)
Data collection
Log size: 150Gb/day Just copying over the network takes 8 to 10h Parsing and processing takes another 4 to 6h After parsing and compressing ~ 45 Gb/day Collected data for 30 days of June 2006:
Total: 1.3Tb of compressed data
Leskovec&Faloutsos, WWW 2008 Part 4-25
![Page 26: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/26.jpg)
Data statistics
Activity over June 2006 (30 days) 245 million users logged in 180 million users engaged in conversations 17,5 million new accounts activated More than 30 billion conversations
Leskovec&Faloutsos, WWW 2008 Part 4-26
![Page 27: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/27.jpg)
Data statistics per day
Activity on June 1 2006 1 billion conversations 93 million users login 65 million different users talk (exchange
messages) 1.5 million invitations for new accounts sent
Leskovec&Faloutsos, WWW 2008 Part 4-27
![Page 28: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/28.jpg)
User characteristics: age
Leskovec&Faloutsos, WWW 2008 Part 4-28
![Page 29: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/29.jpg)
Age piramid: MSN vs. the world
Leskovec&Faloutsos, WWW 2008 Part 4-29
![Page 30: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/30.jpg)
Conversation: Who talks to whom? Cross gender edges:
300 male-male and 235 female-female edges 640 million female-male edges
Leskovec&Faloutsos, WWW 2008 Part 4-30
![Page 31: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/31.jpg)
Number of people per conversation
Max number of people simultaneously talking is 20, but conversation can have more people
Leskovec&Faloutsos, WWW 2008 Part 4-31
![Page 32: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/32.jpg)
Conversation duration
Most conversations are short
Leskovec&Faloutsos, WWW 2008 Part 4-32
![Page 33: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/33.jpg)
Conversations: number of messages
Sessions between fewer people run out of steam
Leskovec&Faloutsos, WWW 2008 Part 4-33
![Page 34: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/34.jpg)
Time between conversations Individuals are highly
diverse What is probability to
login into the system after t minutes?
Power-law with exponent 1.5
Task queuing model [Barabasi ’05]
Leskovec&Faloutsos, WWW 2008 Part 4-34
![Page 35: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/35.jpg)
Age: Number of conversationsU
ser s
elf r
epor
ted
age
High
LowLeskovec&Faloutsos, WWW 2008 Part 4-35
![Page 36: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/36.jpg)
Age: Total conversation durationU
ser s
elf r
epor
ted
age
High
LowLeskovec&Faloutsos, WWW 2008 Part 4-36
![Page 37: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/37.jpg)
Age: Messages per conversationU
ser s
elf r
epor
ted
age
High
LowLeskovec&Faloutsos, WWW 2008 Part 4-37
![Page 38: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/38.jpg)
Age: Messages per unit timeU
ser s
elf r
epor
ted
age
High
LowLeskovec&Faloutsos, WWW 2008 Part 4-38
![Page 39: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/39.jpg)
Who talks to whom: Number of conversations
Leskovec&Faloutsos, WWW 2008 Part 4-39
![Page 40: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/40.jpg)
Who talks to whom: Conversation duration
Leskovec&Faloutsos, WWW 2008 Part 4-40
![Page 41: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/41.jpg)
Geography and communication
Count the number of users logging in from particular location on the earth
Leskovec&Faloutsos, WWW 2008 Part 4-41
![Page 42: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/42.jpg)
How is Europe talking Logins from Europe
Leskovec&Faloutsos, WWW 2008 Part 4-42
![Page 43: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/43.jpg)
Users per geo location
Blue circles have more than 1 million
logins.
Blue circles have more than 1 million
logins.
Leskovec&Faloutsos, WWW 2008 Part 4-43
![Page 44: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/44.jpg)
Users per capita
Fraction of population using MSN:•Iceland: 35%•Spain: 28%•Netherlands, Canada, Sweden, Norway: 26%•France, UK: 18%•USA, Brazil: 8%
Fraction of population using MSN:•Iceland: 35%•Spain: 28%•Netherlands, Canada, Sweden, Norway: 26%•France, UK: 18%•USA, Brazil: 8%
Leskovec&Faloutsos, WWW 2008 Part 4-44
![Page 45: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/45.jpg)
Communication heat map
For each conversation between geo points (A,B) we increase the intensity on the line between A and B
Leskovec&Faloutsos, WWW 2008 Part 4-45
![Page 46: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/46.jpg)
Correlation: Probability:
Age
vs. A
ge
Leskovec&Faloutsos, WWW 2008 Part 4-46
![Page 47: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/47.jpg)
IM Communication Network
Buddy graph: 240 million people (people that login in June ’06) 9.1 billion edges (friendship links)
Communication graph: There is an edge if the users exchanged at least
one message in June 2006 180 million people 1.3 billion edges 30 billion conversations
Leskovec&Faloutsos, WWW 2008 Part 4-47
![Page 48: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/48.jpg)
Buddy network: Number of buddies
Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user)
Leskovec&Faloutsos, WWW 2008 Part 4-48
![Page 49: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/49.jpg)
Communication Network: Degree
Number of people a users talks to in a month
Leskovec&Faloutsos, WWW 2008 Part 4-49
![Page 50: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/50.jpg)
Communication Network: Small-world
6 degrees of separation [Milgram ’60s] Average distance 5.5 90% of nodes can be reached in < 8 hops
Hops Nodes1 10
2 78
3 396
4 8648
5 3299252
6 28395849
7 79059497
8 52995778
9 10321008
10 1955007
11 518410
12 149945
13 44616
14 13740
15 4476
16 1542
17 536
18 167
19 71
20 29
21 16
22 10
23 3
24 2
25 3Leskovec&Faloutsos, WWW 2008 Part 4-50
![Page 51: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/51.jpg)
Communication network: Clustering
How many triangles are closed?
Clustering normally decays as k-1
Communication network is highly clustered: k-0.37
High clustering Low clustering
Leskovec&Faloutsos, WWW 2008 Part 4-51
![Page 52: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/52.jpg)
Communication Network Connectivity
Leskovec&Faloutsos, WWW 2008 Part 4-52
![Page 53: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/53.jpg)
k-Cores decomposition
What is the structure of the core of the network?
Leskovec&Faloutsos, WWW 2008 Part 4-53
![Page 54: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/54.jpg)
k-Cores: core of the network
People with k<20 are the periphery Core is composed of 79 people, each having 68
edges among themLeskovec&Faloutsos, WWW 2008 Part 4-54
![Page 55: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/55.jpg)
Node deletion: Nodes vs. Edges
Leskovec&Faloutsos, WWW 2008 Part 4-55
![Page 56: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/56.jpg)
Node deletion: Connectivity
Leskovec&Faloutsos, WWW 2008 Part 4-56
![Page 57: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/57.jpg)
Web ProjectionsLearning from contextual
graphs of the web
How to predict user intention from the web graph?
![Page 58: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/58.jpg)
Motivation Information retrieval traditionally considered
documents as independent Web retrieval incorporates global hyperlink
relationships to enhance ranking (e.g., PageRank, HITS) Operates on the entire graph Uses just one feature (principal eigenvector) of the
graph Our work on Web projections focuses on
contextual subsets of the web graph; in-between the independent and global consideration of the documents
a rich set of graph theoretic properties
Leskovec&Faloutsos, WWW 2008 Part 4-58
![Page 59: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/59.jpg)
Web projections
Web projections: How they work? Project a set of web pages of interest onto the web
graph This creates a subgraph of the web called projection
graph Use the graph-theoretic properties of the subgraph for
tasks of interest Query projections
Query results give the context (set of web pages) Use characteristics of the resulting graphs for
predictions about search quality and user behavior
Leskovec&Faloutsos, WWW 2008 Part 4-59
![Page 60: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/60.jpg)
Q
Query Results Projection on the web graph
Query projection graph
Generate graphical features
Query connection graph
• -- -- ----• --- --- ----• ------ ---• ----- --- --• ------ -----• ------ -----
Construct case library
Predictions
Query projections
Leskovec&Faloutsos, WWW 2008 Part 3-60
![Page 61: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/61.jpg)
Questions we explore
How do query search results project onto the underlying web graph?
Can we predict the quality of search results from the projection on the web graph?
Can we predict users’ behaviors with issuing and reformulating queries?
Leskovec&Faloutsos, WWW 2008 Part 4-61
![Page 62: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/62.jpg)
Is this a good set of search results?
Leskovec&Faloutsos, WWW 2008 Part 4-62
![Page 63: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/63.jpg)
Will the user reformulate the query?
Leskovec&Faloutsos, WWW 2008 Part 4-63
![Page 64: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/64.jpg)
Resources and concepts Web as a graph
URL graph: Nodes are web pages, edges are hyper-links March 2006 Graph: 22 million nodes, 355 million edges
Domain graph: Nodes are domains (cmu.edu, bbc.co.uk). Directed edge (u,v)
if there exists a webpage at domain u pointing to v February 2006 Graph: 40 million nodes, 720 million edges
Contextual subgraphs for queries Projection graph Connection graph
Compute graph-theoretic features
Leskovec&Faloutsos, WWW 2008 Part 4-64
![Page 65: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/65.jpg)
“Projection” graph
Example query: Subaru
Project top 20 results by the search engine
Number in the node denotes the search engine rank
Color indicates relevancy as assigned by human:– Perfect– Excellent– Good– Fair– Poor– Irrelevant
Leskovec&Faloutsos, WWW 2008 Part 3-65
![Page 66: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/66.jpg)
“Connection” graph Projection graph is
generally disconnected Find connector nodes Connector nodes are
existing nodes that are not part of the original result set
Ideally, we would like to introduce fewest possible nodes to make projection graph connected
Connector nodesProjection
nodes
Leskovec&Faloutsos, WWW 2008 Part 3-66
![Page 67: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/67.jpg)
Finding connector nodes Find connector nodes is a Steiner tree problem which is NP
hard Our heuristic:
Connect 2nd largest connected component via shortest path to the largest
This makes a new largest component Repeat until the graph is connected
Largest component
2nd largest component
2nd largest component
Leskovec&Faloutsos, WWW 2008 Part 4-67
![Page 68: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/68.jpg)
Extracting graph features The idea
Find features that describe the structure of the graph
Then use the features for machine learning
Want features that describe Connectivity of the graph Centrality of projection and
connector nodes Clustering and density of the core
of the graph
vs.
Leskovec&Faloutsos, WWW 2008 Part 4-68
![Page 69: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/69.jpg)
Examples of graph features Projection graph
Number of nodes/edges Number of connected components Size and density of the largest
connected component Number of triads in the graph
Connection graph Number of connector nodes Maximal connector node degree Mean path length between
projection/connector nodes Triads on connector nodes
We consider 55 features total
vs.
Leskovec&Faloutsos, WWW 2008 Part 4-69
![Page 70: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/70.jpg)
Q
Query Results Projection on the web graph
Query projection graph
Generate graphical features
Query connection graph
• -- -- ----• --- --- ----• ------ ---• ----- --- --• ------ -----• ------ -----
Construct case library
Predictions
Experimental setup
Leskovec&Faloutsos, WWW 2008 Part 3-70
![Page 71: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/71.jpg)
Constructing case library for machine learning
Given a task of interest Generate contextual subgraph and extract
features Each graph is labeled by target outcome Learn statistical model that relates the
features with the outcome Make prediction on unseen graphs
Leskovec&Faloutsos, WWW 2008 Part 4-71
![Page 72: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/72.jpg)
Experiments overview Given a set of search results generate projection
and connection graphs and their features
Predict quality of a search result set Discriminate top20 vs. top40to60 results Predict rating of highest rated document in the set
Predict user behavior Predict queries with high vs. low reformulation
probability Predict query transition (generalization vs. specialization) Predict direction of the transition
Leskovec&Faloutsos, WWW 2008 Part 4-72
![Page 73: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/73.jpg)
Experimental details Features
55 graphical features Note we use only graph features, no content
Learning We use probabilistic decision trees (“DNet”)
Report classification accuracy using 10-fold cross validation
Compare against 2 baselines Marginals: Predict most common class RankNet: use 350 traditional features (document, anchor
text, and basic hyperlink features)
Leskovec&Faloutsos, WWW 2008 Part 4-73
![Page 74: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/74.jpg)
Search results quality
Dataset: 30,000 queries Top 20 results for each
Each result is labeled by a human judge using a 6-point scale from "Perfect" to "Bad"
Task: Predict the highest rating in the set of results
6-class problem 2-class problem: “Good” (top 3 ratings) vs. “Poor”
(bottom 3 ratings)
Leskovec&Faloutsos, WWW 2008 Part 4-74
![Page 75: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/75.jpg)
Search quality: the task
Predict the rating of the top result in the set
Predict “Good” Predict “Poor”
Leskovec&Faloutsos, WWW 2008 Part 4-75
![Page 76: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/76.jpg)
Search quality: results Predict top human rating in
the set– Binary classification: Good vs.
Poor 10-fold cross validation
classification accuracy
Observations:– Web Projections outperform
both baseline methods– Just projection graph already
performs quite well– Projections on the URL graph
perform better
AttributesURL
GraphDomain Graph
Marginals 0.55 0.55
RankNet 0.63 0.60
Projection 0.80 0.64
Connection 0.79 0.66
Projection + Connection
0.82 0.69
All 0.83 0.71
Leskovec&Faloutsos, WWW 2008 Part 3-76
![Page 77: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/77.jpg)
Search quality: the model The learned model shows
graph properties of good result sets
Good result sets have:– Search result nodes are hub
nodes in the graph (have large degrees)
– Small connector node degrees
– Big connected component– Few isolated nodes in
projection graph– Few connector nodes
Leskovec&Faloutsos, WWW 2008 Part 3-77
![Page 78: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/78.jpg)
Predict user behavior
Dataset Query logs for 6 weeks 35 million unique queries, 80 million total query
reformulations We only take queries that occur at least 10 times This gives us 50,000 queries and 120,000 query
reformulations Task
Predict whether the query is going to be reformulated
Leskovec&Faloutsos, WWW 2008 Part 4-78
![Page 79: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/79.jpg)
Query reformulation: the task Given a query and corresponding projection and
connection graphs Predict whether query is likely to be reformulated
Query not likely to be reformulated Query likely to be reformulated
Leskovec&Faloutsos, WWW 2008 Part 4-79
![Page 80: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/80.jpg)
Query reformulation: results Observations:
– Gradual improvement as using more features
– Using Connection graph features helps
– URL graph gives better performance
We can also predict type of reformulation (specialization vs. generalization) with 0.80 accuracy
AttributesURL
GraphDomain Graph
Marginals 0.54 0.54
Projection 0.59 0.58
Connection 0.63 0.59
Projection + Connection
0.63 0.60
All 0.71 0.67
Leskovec&Faloutsos, WWW 2008 Part 3-80
![Page 81: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/81.jpg)
Query reformulation: the model
Queries likely to be reformulated have:– Search result nodes have
low degree– Connector nodes are
hubs– Many connector nodes– Results came from many
different domains– Results are sparsely knit
Leskovec&Faloutsos, WWW 2008 Part 3-81
![Page 82: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/82.jpg)
Query transitions
Predict if and how will user transform the query
Q: Strawberry shortcake
Q: Strawberry shortcake pictures
transition
Leskovec&Faloutsos, WWW 2008 Part 4-82
![Page 83: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/83.jpg)
Query transition
With 75% accuracy we can say whether a query is likely to be reformulated: Def: Likely reformulated p(reformulated) > 0.6
With 87% accuracy we can predict whether observed transition is specialization or generalization
With 76% we can predict whether the user will specialize or generalize
Leskovec&Faloutsos, WWW 2008 Part 4-83
![Page 84: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/84.jpg)
Conclusion We introduced Web projections
A general approach of using context-sensitive sets of web pages to focus attention on relevant subset of the web graph
And then using rich graph-theoretic features of the subgraph as input to statistical models to learn predictive models
We demonstrated Web projections using search result graphs for Predicting result set quality Predicting user behavior when reformulating queries
Leskovec&Faloutsos, WWW 2008 Part 4-84
![Page 85: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/85.jpg)
Fraud detection on e-bay
How to find fraudsters on e-bay?
![Page 86: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/86.jpg)
E-bay Fraud detection
Polo Chau & Shashank Pandit, CMU
•“non-delivery” fraud:seller takes $$ and disappears
Leskovec&Faloutsos, WWW 2008 Part 3-86
![Page 87: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/87.jpg)
Potential Buyer CPotential Buyer B
Potential Buyer A
$$$
Seller
$$$
Buyer
A TransactionWhat if something goes BAD?
Non-delivery fraud
Online Auctions: How They Work
Leskovec&Faloutsos, WWW 2008 Part 3-87
![Page 88: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/88.jpg)
Modeling Fraudulent Behavior (contd.)
How would fraudsters behave in this graph? interact closely with other fraudsters fool reputation-based systems
Wow! This should lead to nice and easily detectable cliques of fraudsters …
Not quite experiments with a real eBay dataset showed they
rarely form cliques
092453 0112149Reputation
Leskovec&Faloutsos, WWW 2008 Part 4-88
![Page 89: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/89.jpg)
Modeling Fraudulent Behavior
So how do fraudsters operate?
= fraudster
= accomplice= honest
Bipart
ite Co
re
Leskovec&Faloutsos, WWW 2008 Part 4-89
![Page 90: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/90.jpg)
Modeling Fraudulent Behavior The 3 roles
Honest people like you and me
Fraudsters those who actually commit fraud
Accomplices erstwhile behave like honest users accumulate feedback via low-cost transactions secretly boost reputation of fraudsters (e.g.,
occasionally trading expensive items)
Leskovec&Faloutsos, WWW 2008 Part 4-90
![Page 91: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/91.jpg)
A
C
B
E
D
Belief Propagation
Leskovec&Faloutsos, WWW 200891 of 40
![Page 92: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/92.jpg)
Center piece subgraphs
What is the best explanatory path between the nodes in a graph?
Hanghang Tong, KDD [email protected]
![Page 93: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/93.jpg)
Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( )
App.– Social Networks– Law Inforcement, …
Idea:– Proximity -> random walk
with restarts
A C
B
A C
B
A C
B
b
Leskovec&Faloutsos, WWW 2008 93
![Page 94: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/94.jpg)
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
Leskovec&Faloutsos, WWW 2008 Part 4-94
![Page 95: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/95.jpg)
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
Leskovec&Faloutsos, WWW 2008 Part 4-95
![Page 96: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/96.jpg)
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Umeshwar Dayal
Bernhard Scholkopf
Peter L. Bartlett
Alex J. Smola
1510
13
3 3
5 2 2
327
42_SoftAnd query
ML/Statistics
databases
Leskovec&Faloutsos, WWW 2008 Part 4-96
![Page 97: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/97.jpg)
Details
Main idea: use random walk with restarts, to measure ‘proximity’ p(i,j) of node j to node i
Leskovec&Faloutsos, WWW 2008 Part 4-97
![Page 98: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/98.jpg)
Example
1
2
3
4
5
6
789
11
10 13
12•Starting from 1
•Randomly to neighbor
•Some p to return to 1
Prob (RW will finally stay at j)
Leskovec&Faloutsos, WWW 2008 Part 4-98
![Page 99: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/99.jpg)
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.12601
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
Leskovec&Faloutsos, WWW 2008 99
![Page 100: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/100.jpg)
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
Individual Score matrix
1
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
Leskovec&Faloutsos, WWW 2008 100
![Page 101: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/101.jpg)
AND: Combining Scores
Q: How to combine scores?
A: Multiply …= prob. 3 random
particles coincide on node j
Leskovec&Faloutsos, WWW 2008 101
![Page 102: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/102.jpg)
K_SoftAnd: Combining Scores
Generalization – SoftAND:We want nodes close to k
of Q (k<Q) query nodes.
Q: How to do that?
details
Leskovec&Faloutsos, WWW 2008 102
![Page 103: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/103.jpg)
K_SoftAnd: Combining Scores
Generalization – softAND:We want nodes close to k
of Q (k<Q) query nodes.
Q: How to do that? A: Prob(at least k-out-of-
Q will meet each other at j)
details
Leskovec&Faloutsos, WWW 2008 103
![Page 104: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/104.jpg)
AND query vs. K_SoftAnd query
And Query 2_SoftAnd Query
x 1e-4
1 7
5
10
11
9 8
12
13
4
3
62
0.4505
0.1010
0.0710
0.1010
0.2267
0.1010
0.1010
0.4505
0.0710
0.07100.10100.1010
0.4505
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.0046
0.0019
0.0046
0.0024
0.0046
0.0046
0.0103
0.0019
0.00190.00460.0046
0.0103
Leskovec&Faloutsos, WWW 2008 104
![Page 105: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/105.jpg)
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.1617
0.1387
0.1617
0.0849
0.1617
0.1617
0.0103
0.1387
0.13870.16170.1617
0.0103
1_SoftAnd query = OR query
Leskovec&Faloutsos, WWW 2008 Part 4-105
![Page 106: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/106.jpg)
Challenges in Ceps
Q1: How to measure the importance?– A: RWR
Q2: How to do it efficiently?
Leskovec&Faloutsos, WWW 2008 106
![Page 107: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/107.jpg)
Graph Partition: Efficiency Issue Straightforward way
– solve a linear system: – time: linear to # of edges
Observation– Skewed dist.– communities
How to exploit them?– Graph partition
Leskovec&Faloutsos, WWW 2008 107
![Page 108: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/108.jpg)
Even better:
We can correct for the deleted edges (Tong+, ICDM’06, best paper award)
Leskovec&Faloutsos, WWW 2008 Part 4-108
![Page 109: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/109.jpg)
Experimental Setup
Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1.8M edges
Leskovec&Faloutsos, WWW 2008 Part 3-109
![Page 110: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/110.jpg)
Query Time vs. Pre-Computation Time
Log Query Time
Log Pre-computation Time
•Quality: 90%+ •On-line:
•Up to 150x speedup•Pre-computation:
•Two orders saving
Leskovec&Faloutsos, WWW 2008 Part 4-110
![Page 111: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/111.jpg)
Query Time vs. Storage
Log Query Time
Log Storage
•Quality: 90%+ •On-line:
•Up to 150x speedup•Pre-storage:
•Three orders saving
Leskovec&Faloutsos, WWW 2008 Part 4-111
![Page 112: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/112.jpg)
Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2: How to find connection subgraph? A2:”Extract” Alg.) Q3:How to do it efficiently? A3:Graph Partition and Sherman-Morrison
– ~90% quality– 6:1 speedup; 150x speedup (ICDM’06, b.p.
award)
A C
B
Leskovec&Faloutsos, WWW 2008 Part 3-112
![Page 113: Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.](https://reader035.fdocuments.us/reader035/viewer/2022062323/5697c02b1a28abf838cd84a2/html5/thumbnails/113.jpg)
References Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale
Views on an Instant-Messaging Network, 2007 Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast
Random Walk with Restart and Its Applications ICDM 2006.
Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos: NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks, WWW 2007.
Leskovec&Faloutsos, WWW 2008 Part 4-113