Magnet community identification on social networks
-
Upload
moresmile -
Category
Technology
-
view
135 -
download
0
Transcript of Magnet community identification on social networks
MAGNET COMMUNITY
IDENTIfiCATION ON
SOCIAL NETWORKS
University of Illinois at Chicago, USA
ABSTRACT
A magnet community is such a community that attracts significantly more people’s interests and attentions than other communities of similar topics.
the study of magnet community identificationproblem.
1. We observe several properties of magnet communities.
2. We formalize these properties with the combination of community feature extraction into a graph ranking formulation.
INTRODUCTION
Magnet communities are such communities
that draw significantly more attention than
others even if they are all about the same topic.
INTRODUCTION
Importance:
1. help people understand the trends of their
domains.
2. help people make decisions when joining
communities.
Our goal :
Given communities in a domain, we want to rank
them based on their attractiveness to people among
the communities of that domain.
In the end, the top ranked communities are the ones
people tend to adhere to.
INTRODUCTION
INTRODUCTION
Challenges:
1. how to extract features from these heterogeneous sources of impacting factors of a community’s attractiveness.
2. how to combine all heterogeneous information into a unified ranking model.
3. Noise handling.
common properties:
attention flow
Attention quality
persistence of people’s attention
contributions
A new direction on social network analysis,
namely
magnet community identification.
One definition of magnet communities by
identifying their properties.
We demonstrate the effectiveness of our
framework on a particular domain of magnet
community identification, namely company’s
employee magnet community identification.
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK Attractiveness computation framework
M =( m1 , m2 ,..., mk )
M = f ( FV , FE , M )
M∗
Our objective function:
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK Attractiveness features
Standalone features
Attention migrating matrix as dependency
features
• an attention migrating matrix:D=(dij)k*k
• The attention vector, A =( ai )k∗1 = D · e
•
• Dependency features of communities:
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK Concrete formula of magnet community
ranking framework
At least one of the following conditions hold
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
MAGNET COMMUNITY
IDENTIFICA-
TION FRAMEWORK
EVALUATION
Data collection and features extraction
Data collection
• www.linkedin.com
• Standalone features : a company’s revenue per
employee, industry, location, age
• 39527 companies’ information in 142 industries
EVALUATION
Feature extraction
• industry – count how many people flow into it andout of it, using company level departure and arrival data.
• Locations -- popularities
• Founded year feature -- the number of companies founded for each year.
Ranking performance
Baseline Description
• PageRank
• IT and financial
• The 2011 ideal employer ranking proposed by Universumglobalthe 2011 most admired company ranking by Fortune
EVALUATION
Case studies --- IT
EVALUATION
Case studies --- Financial
EVALUATION
Overall Correctness measures
EVALUATION
Parameter sensitivity
two parameters α and μ
CONCLUSION
提出了magnet community identification 的研究方向
对问题定义和举例、研究意义、挑战、目标、
传统思路的缺陷等有非常充分的说明
算法思路清晰,提出了三个特性,量化成目标
函数
JOINT TOPIC MODELING
FOR EVENT
SUMMARIZATION
ACROSS
NEWS AND SOCIAL
MEDIA STREAMS
Qatar Computing Research Institute
Qatar Foundation Doha, Qatar
ABSTRACT
a novel unsupervised approach based on topic modeling to summarize trending subjects by jointly discovering the representative and complementary information from news and tweets.
1. topic modeling formalism by combining a two-dimensional topic-aspect model and a cross-collection approach in the multi-document summarization literature.
2. co-ranking the news sentences and tweets in both sides.
INTRODUCTION
News -- well-crafted, fact-oriented long stories
written by professionals based on the latest
past events
Tweets – personalized, more opinionated free-
style short messages posted by the average
persons in real time.
INTRODUCTION
contributions
A novel problem of generating complementary summaries
a principled measure to assess the extent ofsentence-level complementarity
A topic modeling approach called cross-collectiontopic-aspect model (ccTAM) that combines ccLDA and topic-aspect mixture model for precisely estimating the proposed complementary measure.
a gold-standard dataset of complementary summaries
PROBLEM DEFINITION
PROBLEM DEFINITION
LEARNING COMPLEMENTARY
RELATION
commonality and difference
general model and media-specific model.
LEARNING COMPLEMENTARY
RELATION
Measuring Commonality and Difference
LEARNING COMPLEMENTARY
RELATION
Cross-collection Topic-Aspect Model (ccTAM)
Inference
Infer the general topic-word distribution φ z
Inference
The collection-specific topic-word distribution
φcz
GENERATE COMPLEMENTARY
SUMMARIES
G =( N ∪ T, E )
N = { n1 ,n2, ··· ,nmn}, T = { t1 ,t2 , ··· ,tnt
}
E = { ( p ( ni | tj ) ,p ( tj | ni )) | i =1 , ··· ,mn ; j
=1 , ··· ,nt } is the set of directed edges
between two sets of nodes whose values are
node-to-node jumping probabilities.
GENERATE COMPLEMENTARY
SUMMARIES
Jumping Probability
GENERATE COMPLEMENTARY
SUMMARIES
Sentences/Tweets Co-ranking
Summary Generation
Summary-level complementarity
Sentence-level complementarity
EXPERIMENTS AND
RESULTS
Data Collecting
EXPERIMENTS AND
RESULTS
gold-standard summaries
The news summaries: English Wikipedia and
Wikinews
Tweets summaries
Baseline Methods
BL-0:LexRank
BL-1: KL-divergence(KLD)
BL-2: Cosine and language modeling(LM)
BL-3:LexRank+Complementarity(LexComp)
EXPERIMENTS AND
RESULTS
Results and Discussions
EXPERIMENTS AND
RESULTS
Results and Discussions
EXPERIMENTS AND
RESULTS
Example of output summaries
CONCLUSIONS
提出了用tweets补充news summarization的想法
提出了补充度的概念,并介绍了一种度量
tweets补充度的方法。
算法结合了很多已有模型,比如topic-aspect
model, cross-collection topic model, random
walk model