CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
-
Upload
bethany-gibbs -
Category
Documents
-
view
226 -
download
0
Transcript of CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS
Big (graph) data analytics
Christos Faloutsos
CMU
CMU SCS
CMU SCS IC '14 C. Faloutsos 2
CONGRATULATIONS!
CMU SCS
CMU SCS IC '14 C. Faloutsos 3
Outline
• Q+A
• Problem definition / Motivation
• Graphs, tensors and brains
• Anomaly detection
• Conclusions
CMU SCS
CMU SCS IC '14 C. Faloutsos 4
Q+A
• Are you recruiting? How many?
• How many do you have?
• How frequently you meet them?
• What is your advising style?
• How do you feel about summer internships?
CMU SCS
CMU SCS IC '14 C. Faloutsos 5
Q+A
• Are you recruiting? How many?
• How many do you have?
• How frequently you meet them?
• What is your advising style?
• How do you feel about summer internships?
• 1 or 2
• 6 (+5pdocs)
• 1/week
• results
• Yes/Maybe (FB, MSR, IBM, ++)
CMU SCS
CMU SCS IC '14 C. Faloutsos 6
Outline
• Q+A
• Problem definition / Motivation
• Graphs, tensors and brains
• Anomaly detection
• Conclusions
CMU SCS
CMU SCS IC '14 C. Faloutsos 7
Motivation
• Data mining: ~ find patterns (rules, outliers)
• How do real graphs look like? Anomalies?
• Time series / Monitoring
Measles @ PA, NY, …
CMU SCS
CMU SCS IC '14 C. Faloutsos 8
Graphs - why should we care?
CMU SCS
C. Faloutsos 9
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
~1B users
$10-$100B revenue
CMU SCS IC '14
CMU SCS
CMU SCS IC '14 C. Faloutsos 10
Outline
• Q+A
• Problem definition / Motivation
• Graphs, tensors and brains
• Anomaly detection
• Conclusions
CMU SCS
NELL & concepts (=groups)• Predicates (subject, verb, object) in knowledge
base
“Barack Obama is the president of
U.S.”
“Eric Clapton playsguitar”
(26M)
(26M)
(48M)
NELL (Never Ending Language Learner) data
Nonzeros =144M
CMU SCS IC '14 C. Faloutsos
Tom MitchellCMU/CS-MLD
11
Vagelis PapalexakisCMU-CS
CMU SCS
Answer : tensor factorization
• Recall: (SVD) matrix factorization: finds blocks
CMU SCS IC '14 C. Faloutsos 12
N users
Mproducts
‘meat-eaters’‘steaks’
‘vegetarians’‘plants’
‘kids’‘cookies’
~ + +
CMU SCS
• PARAFAC decomposition
CMU SCS IC '14 C. Faloutsos 13
= + +subject
object
verb
politicians artists athletes
Answer : tensor factorization
CMU SCS
• PARAFAC decomposition
• Results for who-calls-whom-when– 4M x 15 days
CMU SCS IC '14 C. Faloutsos 14
= + +caller
callee
time
?? ?? ??
Answer : tensor factorization
CMU SCS
Concept Discovery• Concept Discovery in Knowledge Base
CMU SCS IC '14 C. Faloutsos 15
CMU SCS
Concept Discovery• Concept Discovery in Knowledge Base
CMU SCS IC '14 C. Faloutsos 16
NP1: Internet, file, dataNP2: Protocol, software, suite
CMU SCS
Neuro-semantics• Brain Scan Data*
• 9 persons• 60 nouns
• Questions• 218 questions• ‘is it alive?’,
‘can you eat it?’
CMU SCS IC '14 17C. Faloutsos
*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
CMU SCS
Neuro-semantics• Brain Scan Data*
• 9 persons• 60 nouns
• Questions• 218 questions• ‘is it alive?’,
‘can you eat it?’
CMU SCS IC '14 18C. Faloutsos
Patterns?
CMU SCS
Neuro-semantics• Brain Scan Data*
• 9 persons• 60 nouns
• Questions• 218 questions• ‘is it alive?’,
‘can you eat it?’
…
airplane
dog
perso
ns
noun
s
questions
voxelsCMU SCS IC '14 19C. Faloutsos
Patterns?
CMU SCS
Neuro-semantics
20CMU SCS IC '14 C. Faloutsos
=
CMU SCS
Neuro-semantics
21CMU SCS IC '14 C. Faloutsos
Small items ->Premotor cortex
=
CMU SCS
Neuro-semantics
22CMU SCS IC '14 C. Faloutsos
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014
Small items ->Premotor cortex
CMU SCS
CMU SCS IC '14 C. Faloutsos 23
Scalability• Google: > 450,000 processors in clusters of ~2000
processors each [Barroso+, “Web Search for a Planet: The
Google Cluster Architecture” IEEE Micro 2003]• Yahoo: 5Pb of data [Fayyad, KDD’07]• Google-NY, Aug’14: ‘graph with 1T edges, 300B
nodes’• Problem: machine failures, on a daily basis• How to parallelize data mining tasks, then?• A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/
CMU SCS
CMU SCS IC '14 C. Faloutsos 24
Outline
• Q+A
• Problem definition / Motivation
• Graphs, tensors and brains
• Anomaly/fraud detection
• Conclusions
CMU SCS
App-store fraud
Opinion Fraud Detection in Online Reviews using Network Effects
Leman Akoglu, Rishi Chandy, CF
ICWSM’13
CMU SCS IC '14 C. Faloutsos 25
(NSF grant, with Alex Beutel)
CMU SCS
Problem• Given
– user-product review network– review sign (+/-)
• Classify– objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’
No side data! (e.g., timestamp, review text)
CMU SCS IC '14 C. Faloutsos 26
CMU SCS
Formulation: BPUser Producthonest bad
honest good
CMU SCS IC '14 C. Faloutsos 27
–+
Before
After
CMU SCS
Top scorers
CMU SCS IC '14 C. Faloutsos 28
+ positive (4-5) ratingo negative (1-2) rating
Users
Products
CMU SCS
Top scorers
CMU SCS IC '14 C. Faloutsos 29
+ positive (4-5) ratingo negative (1-2) rating
Users
Products
CMU SCS
‘Fraud-bot’ member reviews
CMU SCS IC '14 C. Faloutsos 30
Same developer! Duplicated text! Same day activity!
CMU SCS
CMU SCS IC '14 C. Faloutsos 31
Outline
• Q+A
• Problem definition / Motivation
• Graphs, tensors and brains
• Anomaly/fraud detection
• Time series, monitoring / forecasting
• Conclusions
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 32C. Faloutsos
Yasuko Matsubara
50 states x46 diseases
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 33C. Faloutsos
Prof. Yasuko Matsubara
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 34C. Faloutsos
Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 35C. Faloutsos
Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 36C. Faloutsos
Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 37C. Faloutsos
Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 38C. Faloutsos
Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?
CMU SCS
‘Tycho’ – epidemics analysis
CMU SCS IC '14 39C. Faloutsos
Prof. Yasuko Matsubara
https://www.tycho.pitt.edu/resources.phpfrom U. Pitt (epidemiology dept.)
Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.
CMU SCS
Open research questions
• Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo)
• Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’)
• How is the human brain wired
CMU SCS IC '14 C. Faloutsos 40
CMU SCS
CMU SCS IC '14 C. Faloutsos 41
Contact info
• www.cs.cmu.edu/~christos• GHC 8019• Ph#: x8.1457• www.cs.cmu.edu/~christos/TALKS/14-09-ic/
• FYI: Course: 15-826, Tu-Th 3:00-4:20
• and, again WELCOME!