Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari...
-
Upload
brett-bryan -
Category
Documents
-
view
217 -
download
0
Transcript of Social networks from the perspective of Physics János Kertész 1,2 Jukka-Pekka Onnela 2, Jari...
Social networks from the Social networks from the perspective of Physicsperspective of Physics
János KertészJános Kertész1,21,2 Jukka-Pekka OnnelaJukka-Pekka Onnela22, Jari Saramäki, Jari Saramäki22,, Jörkki Hyvönen Jörkki Hyvönen22, Kimmo Kaski, Kimmo Kaski22,, Jussi Kumpula Jussi Kumpula22 David David
LazerLazer33 Gábor SzabóGábor Szabó33,4,4, Albert-László Barabási, Albert-László Barabási33,4,4
11Budapest University of Technology and Economics, HungaryBudapest University of Technology and Economics, Hungary 22Helsinki University of Technology, FinlandHelsinki University of Technology, Finland
33Harvard Harvard UniversityUniversity44University of Notre Dame, USA University of Notre Dame, USA
OutlineOutline
0. Introduction0. Introduction1.1. Constructing the social network Constructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
IntroductionIntroduction
Complex systems: More input needed than mere interactions Forget about interactions
Networks: Scaffold of complexity
Useful to concentrate on the carrying NW structure (nodes and links): Holistic approach with very general statements
Spectacular recent development:Abundance of data due to IT + new concepts
IntroductionIntroduction
WEIGHTED NW-SStep toward reductionism: Interactions have different strength weights on links Weights: Fluxes (traffic or chemical reactions), correlation based networks, etc.
(Often no negative weights, wij 0.)
How to characterize weighted NW-s? E.g. STRENGTH of node i: si = j wij
Intensity, coherence of subgraphs; clustering, motifs etc. (see: Onnela et al. PRE 71, 065103(R) (2005)
IntroductionIntroduction
SOCIAL NW-S: Much has been taken from Sociology: betweennes, clustering, assortativity…Main method: Questionnaires (10 - 10 000)
Weighted social nw-s: Strength of socialrelationships varies over wide range„I know him/her”
„We are on first name basis”„We are friends”„We are good friends”„We are very good friends”…
Scale?Subjectivity?
Howtomeasure?
IntroductionIntroduction
Advantage of questionnaires: Ask whatever you are interested in. It enables complex studies, multi-factor analyses.Disadvantage: Difficulty in quantification and subjectivity
E.g., AddHealth: Quantification of tie strength by number of joint activities
Mutuality test fails very oftenM.Gonzales et al. Physica A 379, 307-316.
(2007)
Alternative approach: Use communication
databases (email, phone etc)
OutlineOutline
0. 0. IntroductionIntroduction1.1. Constructing the social networkConstructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
Constructing the NetworkConstructing the Network
• Use a network constructed from mobile phone Use a network constructed from mobile phone calls as a proxy for a social networkcalls as a proxy for a social network
• In the network:In the network:
Nodes Nodes individuals individuals
Links Links voice calls voice calls
• Link weights: Link weights:
• Number of calls Number of calls
• Total call duration (time & money)(time & money)
• Over 7 million Over 7 million private mobile phoneprivate mobile phone subscriptions subscriptions• Focus: voice calls within the home operator Focus: voice calls within the home operator
• Data aggregated from a period of 18 weeksData aggregated from a period of 18 weeks• Require reciprocity (Require reciprocity (XXY AND YY AND YXX) for a link) for a link
• Customers are anonymous (hash codes)Customers are anonymous (hash codes)• Data from Data from anan European mobile operator European mobile operator
Constructing the NetworkConstructing the Network
Y
X 15 min
5 min
20 minX
Y
OutlineOutline
0. Introduction0. Introduction1.1. Constructing the social network Constructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
Basic Statistics: Visualisation
Largest connected component dominates
3.9M / 4.6M nodes
6.5M / 7.0M links
Use it for analysis!Use it for analysis!
Basic Statistics: Distributions
Fat tailFat tail
Vertex degree distributionVertex degree distribution Link weight distributionLink weight distribution
Dunbar number (monkeysphere):max ~150 connections
OutlineOutline
0. Introduction0. Introduction1.1. Constructing the social network Constructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
Granovetter’s Weak Ties Hypothesis
• Granovetter* suggests analysis of social networks as a tool for linking micro and macro levels of sociological theory
• Considers the macro level implications of tie (micro level) strengths:
“The strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.”
• Formulates a hypothesis:The relative overlap of two individual’s friendship networks varies directly with the strength of their tie to one another
• Explores the impact of the hypothesis on, e.g. diffusion of information, stressing the cohesive power of weak ties
* M. Granovetter, The Strength of Weak Ties, The American Journal of Sociology 78, 1360-1380, 1973.
Granovetter’s Weak Ties Hypothesis
• Hypothesis based on theoretical work and some direct evidence
• Present network is suitable for testing the hypothesis: (i) Call durations time commitment tie strength(ii) Call durations monetary commitment tie strength (iii) Largest weighted social network so far
(Problem: Other factors, such as emotional intensity or reciprocal services?)
• What is the coupling between network topology and link weights?
• Consider two connected nodes. We would like to characterize their relative neighborhood overlap, i.e. proportion of common friends
• This leads naturally to link neighborhood overlap
Overlap
• Definition: relative neighborhood overlap (topological)
where the number of triangles around edge (vi, vj) is nij
• Illustration of the concept:
ijji
ijij nkk
nO
)1()1(
Empirical Verification
• Let <O>w denote Oij averaged over a bin of w-values
• Use cumulative link weight distribution: (the fraction of links with weights less than w’)
´
cum )(´)(ww
wPwP
• Relative neighbourhood overlap increases as a function of link weight Verifies Granovetter’s hypothesis (~95%) (Exception: Top 5% of weights)
Blue curve: empirical network
Red curve: weight randomised network
Local Implications
• Implication for strong links?
Neighbourhood overlap is high
People form strongly connected communities
• Implication for weak links?
Neighbourhood overlap is low
Communities are connected by weak links
A Piece of the NetworkA Piece of the Network
communitycommunity
weak linksweak links
strong strong linkslinks
Overlap
ijji
ijij nkk
nO
)1()1(Global optimization to transport would put high weights to links with high betweenness centrality(# passing shortest paths)
In contrast, <O > decreases with b
High Weight Links?
• Weak links: Strengh of both adjacent nodes (min & max) considerably higher than link weight
• Strong links: Strength of both adjacent nodes (min & max) about as high as the link weight
• Indication: High weight relationships clearly dominate on-air time of both, others negligible
• Time ratio spent communicating with one other person converges to 1 at roughly w ≈ 104
• Consequence: Less time to interact with others
• Explaining onset of decreasing trend for <O>w
ijji wss /),min(
ijji wss /),max(
OutlineOutline
0. Introduction0. Introduction1.1. Constructing the social network Constructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
• Children’s approach: Break to learn!
• We do this systematically using thresholding analysis:• Order the links by weight • Delete the links, one by one, based on their order
• Control parameter f is the fraction of removed links
• We can continuously interpolate, in either direction, between the initial connected network (f=0) and the set of isolated nodes (f=1)
• We use two different thresholding schemes(i) Increasing thresholding (remove low wij/Oij links
first)(ii) Descending thresholding (remove high wij/Oij links
first)
• Question: How does the network respond to link removal?• How similar is the response to wij and Oij driven thresholding?
Thresholding Analysis: Introduction
Thresholding
Initial connected network (f=0) All links are intact, i.e. the network is in its initial stage
Thresholding
Increasing weight thresholded network (f=0.8) 80% of the weakest links removed, strongest 20% remain
Thresholding
Initial connected network (f=0) All links are intact, i.e. the network is in its initial stage
Thresholding
Decreasing weight thresholded network (f=0.8) 80% of the strongest links removed, weakest 20% remain
We will study, as a function of the control parameter f, the following:
1. Order parameter (size of the largest component)2. “Susceptibility” (average size of other components)3. Average path lengths (in LCC)4. Average clustering coefficient in the LCC
Thresholding
Thresholding: Size of Largest Component
(c)
• RLCC is the fraction of nodes in the largest connected component
• LCC is able to sustain its integrity for moderate values of f • Least affected by removal of high Oij links (in tight
communities)• Most affected by removal of low Oij links (between
communities)• Difference between removal of low and high wij links is small,
but LCC breaks earlier if weak links are removed (Granovetter) • Very few links are required for global connectivity
remove low first remove high first
Thresholding: Size of Other Components
(c)
• Collapse for different values of f, but what is its nature?• “Susceptibility” (average cluster size excl. LCC) ns is the number of clusters with s nodes• Percolation theory: S→∞ as f→fc
Finite signature of divergence: fc ≈ 0.60 (incr. o.) fc ≈ 0.82 (incr. w.) • Demarcation between weak and strong links given by fc ≈ 0.82 • Qualitatively different role for weak and strong links
s
snsS 2
remove low first remove high first
0. Introduction0. Introduction
1.1. Constructing the social network Constructing the social network
2.2. Basic statisticsBasic statistics
3.3. Granovetter’s hypothesisGranovetter’s hypothesis
4.4. Thresholding (percolation)Thresholding (percolation)
5.5. Diffusion of infromationDiffusion of infromation
6.6. ModelingModeling
7.7. ConclusionsConclusions
OutlineOutline
Diffusion of informationKnowledge of information diffusion based on unweighted networksUse the present network to study diffusion on a weighted network: Does
the local relationship between topology and tie strength have an effect? Spreading simulation: infect one node with new information
(1) Empirical: pij wij
(2) Reference: pij <w>
Spreading significantly faster on the reference (average weight) networkInformation gets trapped in communities in the real network
ijij xwp
Reference
Empirical
Diffusion of information
• Where do individuals get their information? Majority of infections through(1) Empirical: ties of intermediate strength(2) Reference: (would be) weak ties
• Both weak and strong ties have a diminishing role as information sources: The weakness of weak and strong ties
Reference
Empirical
Best Best search search results: results: Reach out Reach out of your of your own own communitcommunityyEmpirical
Diffusion of information
- Start spreading 100 times (large red node)- Information flows differently due to the local organizational principle
(1) Empirical: information flows along a strong tie backbone(2) Reference: information mainly flows along the shortest paths
Reference
0. Introduction0. Introduction
1.1. Constructing the social network Constructing the social network
2.2. Basic statisticsBasic statistics
3.3. Granovetter’s hypothesisGranovetter’s hypothesis
4.4. Thresholding (percolation)Thresholding (percolation)
5.5. SpreadingSpreading
6.6. ModelingModeling
7.7. ConclusionsConclusions
OutlineOutline
Modeling
What is all this good for?• Understanding structure and mechanisms of the society• Improving spreading of news and opinions(Developing marketing strategies and other tools of mass manipulation)
MODELING needed
Modeling
Needed: Weighted network model, which reflects the observations with possibly limited input
Links created by random encounters on acquaintance basis
Weights generated by one-to-one activities (phone calls)
Take into account the different time scales:
Encounter (call) frequency
Lifetime of relationships
Lifetime of nodes treated together
Modeling
i meets j with prob. wij , who meets k with prob. wjk. If k is a common friend wij, wjk wki are increased by (a). If k is not connected to i, wik = w0 ( = 1) is created with probability p (b).
With prob. pr new links with w0 weight are created (c).With prob. pd a node with all links is deleted and a
new one is born with no links.
Microscopic rules in the model
Summary of the model• Weighted local search for new acquaintances• Reinforcement of existing (popular) links• Unweighted global search for new acquaintances• Node removal, exp.link & weight lifetimes: <τ>=2
<τw>=(pd)-1
Model parametersδ Free weight reinforcement parameter
pr = 10-3 Sets the time scale of the model < τN > =1/pd
(average node lifetime of 1000 time steps)
pr = 5×10-4 Global connections; results not sensitive for it(one random link per node during 1000 time steps)
pΔ Adjusted in relation to δ to keep <k> constant(structure changes due to only link re-organisations)
Social network model
Tie strength:Tie strength: weak weak →→ intermediate intermediate →→ strong strong tietie
Samples of N=105 network for variable weight-increase δ
No communities
0Communities
start nucleating
1.0 Communities
forming
5.0Communities with dense & strong internal and sparse & weak external connections (cf. phone network)
1
Communities by inspection
• Average number of links constant: <L> = N <k>/2
(<k> ≈ 10 ) => All changes in structure
due to re-organisation of links
• Increasing δ traps search in communities, further
enhancing trapping effect
=> Clear communities form
• Triangles accumulate weight and act as nuclei for communities to emerge
δ = 0.1δ = 0
δ = 0.5 δ = 1
Communities by k-clique method
• k-clique algorithm as definition for communities*• Focus on 4-cliques (smallest non-trivial cliques)
• Relative largest community size Rk=4 [0,1]
• Average community size <ns> (excl. largest)
• Observe clique percolation through the system for small δ
• Increasing δ leads to condensation of communities
* G. Palla et al., “Uncovering the overlapping community structure...”, Nature 435, 814 (2005)
Global consequences
Ascending link removal
Model networkDescending link
removal
Phone networkAscending & Descending
Phase transition for ascending tie removal (weaker first)
Fraction of links, ff f 0 1
Modeling
The model fulfills essential criteria of social nw-s:
• Broad (but not scale free degree) distribution• Assortative mixing (popular people attract each
other)• High clustering: many triangles (by construction)• Community structure with strong links inside and weak ones between them
OutlineOutline
0. Introduction0. Introduction1.1. Constructing the social network Constructing the social network 2.2. Basic statisticsBasic statistics3.3. Granovetter’s hypothesisGranovetter’s hypothesis4.4. Thresholding (percolation)Thresholding (percolation)5.5. SpreadingSpreading6.6. ModelingModeling7.7. ConclusionsConclusions
Discussion and Conclusion• Weak ties maintain network’s structural integrity; Strong ties maintain local
communities; Intermediate ties mostly responsible for first-time infections• How can one efficiently search for information in a social network? ”Go out
of your community!”• Social networks seem better suited to local processing than global
transmission of information• Are there simple rules or mechanisms that lead to observed properties?• Efficient modeling possible
Publications: J.-P. Onnela, et al. PNAS 104, 7332-7336 (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007)
J.M. Kumpula, et al. PRL (to be published) www.phy.bme.hu/~kertesz/