Indrajit Bhattacharya Research Scientist IBM Research, Bangalore
-
Upload
driscoll-york -
Category
Documents
-
view
33 -
download
3
description
Transcript of Indrajit Bhattacharya Research Scientist IBM Research, Bangalore
![Page 1: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/1.jpg)
Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on
Users in Social Media*Indrajit Bhattacharya
Research ScientistIBM Research, Bangalore
*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya
Workshop on Social ComputingIIT Kharagpur, Oct 5-6 2012
![Page 2: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/2.jpg)
Social Media Analysis: Motivation
Microblogs: Twitter, Facebook, MySpace
Understanding and analyzing topics & trends
Influences on users
Variety of stakeholders
Business
Government
Social scientists
2
![Page 3: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/3.jpg)
Social Media Analysis: Challenges
Network and Influences on Users
User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]
Dynamic nature
Topics & user personalities evolve over time
Volume of data
Existing approaches fall short 3
![Page 4: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/4.jpg)
Soc Med Analysis: State of the Art
Content Analysis
Ramage ICWSM 2010, Hong SOMA 2010
Variants of LDA
Inferring User Interests
Ahmed KDD 2011, Wen KDD 2010
Individual features such as user activity or network
Patterns in Temporal Evolution
Yang et al WSDM 20114
![Page 5: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/5.jpg)
Bayesian Non-parametric Models
Choosing no of components in a mixture model
Particularly severe problem for large data volumes such as for social media data
Bayesian solution
Infinite dimensional prior
Allows no of mixture components to grow with data size
Cannot capture richness of social media data
Algorithms often not scalable 5
![Page 6: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/6.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Multi-threaded Online Inference Algorithm
Experimental Results 8
![Page 7: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/7.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Multi-threaded Online Inference Algorithm
Experimental Results 9
![Page 8: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/8.jpg)
Dirichlet Process (Informal)
10
![Page 9: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/9.jpg)
Dirichlet Process: Properties
12
![Page 10: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/10.jpg)
Chinese Restaurant Process (CRP)
14
![Page 11: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/11.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Parallelized Online Inference Algorithm
Experimental Results 15
![Page 12: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/12.jpg)
Relational Ch. Rest. Pr. (RelCRP)
R16
![Page 13: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/13.jpg)
Relational Ch. Rest. Pr. (RelCRP)
17
![Page 14: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/14.jpg)
Influence of World-wide Factors
18
![Page 15: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/15.jpg)
Influence of World-wide Factors
19
![Page 16: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/16.jpg)
Influence of Personal Preferences
20
![Page 17: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/17.jpg)
Influence of Personal Preferences
21
![Page 18: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/18.jpg)
Influence of Friend Network
22
![Page 19: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/19.jpg)
Influence of Friend Network
23
![Page 20: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/20.jpg)
Influence of Geography
India China
UK
24
![Page 21: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/21.jpg)
Influence of Geography
25
![Page 22: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/22.jpg)
Aggregating Influences
RelCRP is exchangeable like the CRP
Useful as a prior for infinite mixture model
RelCRP captures influence of one relation on posts
Influences act simultaneously on any user
Aggregated influence pattern is user specific
Different users affected differently by same combination of world-wide and geographic factors
![Page 23: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/23.jpg)
Multi Relational CRP
28
![Page 24: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/24.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Multi-threaded Online Inference Algorithm
Experimental Results 30
![Page 25: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/25.jpg)
Evolving Patterns in Social Media
Number of Topics
Topics die and new ones are born
User Personalities
Susceptibility to influence by world-wide, geographic and friends’ preferences
Existing Topic Distributions
Words go out of fashion, new ones enter vocabulary
Topic Characters:
Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31
![Page 26: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/26.jpg)
Dynamic MultiRelCRP
32
![Page 27: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/27.jpg)
User Personality Trends
33
![Page 28: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/28.jpg)
Evolving Topic Distributions
34
![Page 29: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/29.jpg)
Topic Character Trends
35
![Page 30: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/30.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Multi-threaded Online Inference Algorithm
Experimental Results 36
![Page 31: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/31.jpg)
Inference and Estimation Tasks
37
![Page 32: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/32.jpg)
Online Algorithm
Traditional iterative framework does not scale for social media data
Sequential Monte Carlo methods [Canini AIStats ‘09] that rejuvenate some old labels also infeasible
Online sampling [Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase
Adapt for non-parametric setting
38
![Page 33: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/33.jpg)
Multi-threaded Implementation
Sequential online implementation does not scale
Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]
Our algorithm is parallel, online and non-parametric
Explicit consolidation by master thread at the end of each iteration
Only new topics consolidated 39
![Page 34: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/34.jpg)
Talk Outline
Background: Chinese Restaurant Processes
CRP with multiple relationships: (RelCRP, MRelCRP)
Dynamic MRelCRP
Multi-threaded Online Inference Algorithm
Experimental Results 40
![Page 35: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/35.jpg)
Datasets and Baselines
Twitter: 360 million tweets (Jun-Dec 2009)
Facebook: 300,000 posts (public profiles, 3 mths)
Latent Dirichlet Allocation (LDA)
[Hong SOMA 2010]
Labeled LDA (L-LDA)
Hashtags as topics [Ramage ICWSM 2010]
Timeline
Dynamic non-parametric topic model [Ahmed UAI 2010] 41
![Page 36: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/36.jpg)
1 Model Goodness
Perplexity: Ability to generalize to unseen data
Both network and dynamics are important for modeling social media data
Model Twitter FacebookDMRelCRP 1188.29 1562.34Timeline 1582.86 1802.9L-LDA 1982.76 -LDA 2932.06 3602
Perplexity
42
![Page 37: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/37.jpg)
2 Quality of Discovered Topics
Label assigned to each post indicating category
Distribution over words indicating semantics
A. Clustering posts using topic labels
B. Prediction using topic labels
Predicting post authorship & user commenting activity
C. Major event detection
43
![Page 38: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/38.jpg)
2A Post Clustering using Topics
Use hashtags as gold standard (for Twitter)
16K posts #NIPS2009, #ICML2009, #bollywood etc
DMRelCRP close to L-LDA without using hashtags
DMelCRP produces ‘finer-grained’ clusters
Model nMI R-Index F1DMRelCRP 0.93 0.88 0.86Timeline 0.81 0.72 0.73L-LDA 1 1 1LDA 0.55 0.52 0.48
Clustering accuracy (Tw)
44
![Page 39: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/39.jpg)
2B Prediction Using Topics
Authorship: Given post and user, predict if author
Commenting activity: Given post and (non-author) user, predict if user comments on that post
DMRelCRP topics lead to more accurate prediction
Model Twitter Facebook Twitter FacebookDMRelCRP 0.793 0.734 0.683 0.648Timeline 0.718 0.669 0.582 0.579L-LDA 0.521 0.432 0.429 0.482LDA 0.647 - 0.542 -
Authorship Commenting
45
![Page 40: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/40.jpg)
2C Major Event Detection
47
![Page 41: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/41.jpg)
2C Major Event Detection
48
![Page 42: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/42.jpg)
3 Analysis of Influences
49
![Page 43: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/43.jpg)
3A Global Personality Trends
50
![Page 44: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/44.jpg)
3A Global Personality Trends
51
Michael Jackson’s death
FIFA WC
Google Wave
![Page 45: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/45.jpg)
3A Global Personality Trends
52
![Page 46: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/46.jpg)
3B Geo-specific Personality Trends
Personality trends very similar in UK and US
Geographic influences high at different epochs 53
![Page 47: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/47.jpg)
3B Geo-specific Personality Trends
India: W-wide and geographic influences weaker
China: W-wide weak, geo strong; stable pattern 54
![Page 48: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/48.jpg)
3C Topic Character Trends
55
![Page 49: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/49.jpg)
3C Topic Character Trends
56
![Page 50: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/50.jpg)
3C Topic Character Trends
57
![Page 51: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/51.jpg)
Scaling with Data Size
Java-based multi-threaded framework; 7 threads
8-core 32 GB RAM
Scales largely because of multi-threading 58
![Page 52: Indrajit Bhattacharya Research Scientist IBM Research, Bangalore](https://reader036.fdocuments.us/reader036/viewer/2022062301/568134c4550346895d9be794/html5/thumbnails/52.jpg)
Summary
First attempt at studying user influences in social media data
New non-parametric model that captures multiple relationships and temporal evolution
Multi-threaded online Gibbs sampling algorithm
Extensive evaluation on large real dataset
Topics lead to better clustering and prediction
Insights on user influence patterns
59