Social Network Analysis &
Network Optimization
Dimitrios Katsaros, Ph.D.Koblenz, February 18th, 2008
@ Dept. of Computer & Communication Engineering, University of Thessaly@ Dept. of Informatics, Aristotle University
2
Outline of the talk
•A summary of my research•Latest results: “Social Network Analysis for
Network Optimization”•Web (2nd round review @ IEEE Transactions on Knowledge & Data Engineering)
•PRIMITIVE: Community Identification•PROTOCOL: Content Outsourcing•GOAL: Latency Reduction
•Wireless Multimedia Sensor Nets (2nd round review @ ACM Mobile Networks & Applications)
•PRIMITIVE: “Important” Sensor Nodes Identification•PROTOCOL: Cooperative Caching•GOAL: Latency Reduction
•Collective Intelligence: Latest step of cyberspace
4
Research areas: Ultimately ???
Overlay NetsMobile/Pervasive Computing
Sensors
Ad
Hoc
Information Retrieval
Web
Location Tracking
Caching &
Air-Indexing Peer-to-Peer NetworksContent
Distribution
Networks
Caching &
Prefetching &
Replication &
Semistructured Data &
Web views
Web Ranking &
Search Engines
Social Network Analysis
Cooperative Caching &Sensor Node Clustering &
Distributed Indexing &Coverage/Connectivity &
Flash storage &
Content-Based MIR
Broadcasting &Data Dissemination
Web
cast
ing
INTELLIG
ENCE
Pervasive
Web
5
Social Network Analysis
• A social network is a social structure to describe social relations (wikipedia)
• The history of Social Network is older than everybody who is here
• More than 100 years (Cooley 1909, Durkheim 1893)• Focusing on small groups• Information Techniques give it a new life
[book: Stanley Wasserman & Katherine Faust]
1. Mathematical Representation2. Structural & Locational Properties3. Roles & Positions4. Dyadic & Triadic Methods
6
Social Network Analysis
[Stanley Wasserman & Katherine Faust]
1. Mathematical Representation2. Structural & Locational Properties
1. Centrality1. Betweenness Centrality
3. Roles & Positions4. Dyadic & Triadic Methods
7
Betweenness Centrality
• Let σuw= σwu denote the number of shortest paths
from u V to w V (by definition, σuu= 0)
• Let σuw(v) denote the number of shortest paths
from u to w that some vertex v V lies on
• The Betweenness Centrality index NI(v) of a vertex v is defined as:
• Large values for the NI index of a node v indicate that this node can reach others on relatively short paths, or that v lies on considerable fractions of shortest paths connecting others
8
The NI index in sample graphs
In parenthesis, the NI index of the respective node; i.e., 7(156): node with ID 7 has NI equal to 156.
Nodes with large NI:
Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18
With large fanout, e.g., 14, 8, U
Therefore: geodesic nodes
10
Betweenness Centrality in …
• [WEB] Performing graph clustering and recognizing communities in Web site graphs
• [WIRELESS MULTIMEDIA SENSOR NETWORKS] Recognizing (in a distributed fashion) important sensor nodes, the mediators, that coordinate cooperative caching decisions
13
CiBC Method
• Target: is true
• CiBC method:• Building cliques and clusters around representative
(pole) nodes (with low CB)
• Earlier methods have • Defined “hard communities”: node
deg(inCom)>deg(outCom)• exploited “edge betweenness” to perform hierarchical
agglomerative clustering
sCd
Cd
in
out )(
)(
14
CiBC Method
ID NI index
10 20.68
2 19.61
6 11.38
1 10.28
7 2.06
0 1.73
9 0.99
8 0.99
4 0.75
5 0.00
11 0.000
12
3 4
5
6
7
8
10
9
11
Phase 1: NI Computation -O(nm)
Phase 2: Initialization of cliques
O(n)
15
CiBC Method
ID NI index
10 20.68
2 19.61
6 11.38
1 10.28
7 2.06
0 1.73
9 0.99
8 0.99
4 0.75
5 0.00
11 0.000
12
3 4
5
6
7
8
10
9
11
Phase 2: Initialization of cliques
O(n)
16
CiBC Method
ID NI index
10 20.68
2 19.61
6 11.38
1 10.28
7 2.06
0 1.73
9 0.99
8 0.99
4 0.75
5 0.00
11 0.000
12
3 4
5
6
7
8
10
9
11
Phase 2: Initialization of cliques
O(n)
17
CiBC Method
ID NI index
10 20.68
2 19.61
6 11.38
1 10.28
7 2.06
0 1.73
9 0.99
8 0.99
4 0.75
5 0.00
11 0.000
12
3 4
5
6
7
8
10
9
11
Phase 2: Initialization of cliques
O(n)
18
CiBC Method
ID NI index
10 20.68
2 19.61
6 11.38
1 10.28
7 2.06
0 1.73
9 0.99
8 0.99
4 0.75
5 0.00
11 0.000
12
3 4
5
6
7
8
10
9
11
Phase 2: Initialization of cliques
O(n)
19
CiBC Method
A
B
A B C D
A 3 3 0 0
B 3 3 1 1
C 0 1 3 4
D 0 1 4 3
0
12
3 4
5
6
7
8
10
9
11C D
Phase 3: Clique Merging &
Creation of Communities
Complexity: O(l2)l is the number of cliques
20
CiBC Method
A
B
A B C D
A 3 3 0 0
B 3 3 1 1
C 0 1 3 4
D 0 1 4 3
0
12
3 4
5
6
7
8
10
9
11C D
Phase 3: Clique Merging &
Creation of Communities
43
21
CiBC Method
A
B
A B C
A 3 3 0
B 3 3 2
C 0 2 10
0
12
3 4
5
6
7
8
10
9
11C
Phase 3: Clique Merging &
Creation of Communities
22
CiBC Method
A
B
A B C
A 3 3 0
B 3 3 2
C 0 2 10
0
12
3 4
5
6
7
8
10
9
11C
Phase 3: Clique Merging &
Creation of Communities
23
CiBC Method
A
A C
A 9 2
C 2 10
0
12
3 4
5
6
7
8
10
9
11C
Phase 3: Clique Merging &
Creation of Communities
Phase 4: Check constraints
26
The NICoCa protocol
• Each node is aware of its 2-hop neighborhood• Uses NI to characterize some neighbors as
mediators• A node can be either a mediator or an ordinary
node
• Each sensor node stores• the dataID, and the actual multimedia datum• the data size, TTL interval• for each cached item, the timestamps of the K most
recent accesses• each cached item is characterized either as O (i.e.,
own) or H (i.e., hosted)
27
The cache discovery protocol (1/2)
A sensor node issues a request for a multimedia item• Searches its local cache and if it is found
(local cache hit) then the K most recent access timestamps are updated
• Otherwise (local cache miss), the request is broadcasted and received by the mediators
• These check the 2-hop neighbors of the requesting node whether they cache the datum (proximity hit)
• If none of them responds (proximity cache miss), then the request is directed to the Data Center
28
The cache discovery protocol (2/2)
When a mediator receives a request, searches its cache• If it deduces that the request can be satisfied by a
neighboring node (remote cache hit), forwards the request to the neighboring node with the largest residual energy
• If the request can not be satisfied by this mediator node, then it does not forward it recursively to its own mediators, since this will be done by the routing protocol, e.g., AODV
• If none of the nodes can help, then requested datum is served by the Data Center (global hit )
30
Cache vs. hits (MB files & uniform access) in a dense WMSN (d = 7)
HYBRID: appears at:
L. Yin and G. Cao, “Supporting cooperative caching in ad hoc networks”, IEEE Transactions on Mobile Computing, 5(1):77-89, 2006
31
Evolution of cyberspace …
Semantic Web + Pervasive Computing
WWW + Broadband + WIFI + grid computingUnicode + XML + RDF + Ontologies
Internet + Multimedia + URL + HTTP + HTML
Servers + Telecom Networks + PCs + TCP-IP + e-mail + FTP
Computers + Micro-chips + Application Software + WYSIWYG Interfaces
Transistors+Formal Logic+Digital Coding+ Program. Languages
Collective
Intelligence Net
Semantic Web
WWW
Internet
PC
Computer
32
Why Collective Intelligence?• Users/ devices generate data at an unprecedented
rate• Blogs• Tags• Sensor measurements• Web pages• Rankings by search engines
• They could be treated as “opinions” or “votes”• Under some conditions: group IQ > individual IQ• [So far] Opinion/Vote fusion:
• PageRank (i.e., collective linking preferences) • Metasearching (ranked list merging)• Collaborative filtering (what is interesting from what other
people say, what people like you say)• …..
33
Collective Intelligence: Some challenges
• Statistical analysis of social networks• Identification of influential opinions
and/or producers• Discover social context to provide
personalization• Opinion spam• Bias filtering
34
Collective Intelligence: Some challenges
• Finding high-quality content• Opinion mining• Dealing with controversies• Metadata from data analysis• Storage of metadata• …………….
MOST IMPORTANTLY• In Centralized and/or Distributed
settings
36
ReferencesOur work• D. Katsaros, G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y.
Manolopoulos. “CDNs Content Outsourcing via Generalized Communities”. IEEE Transactions on Knowledge and Data Engineering, (under second round review), December, 2007.
• N. Dimokas, D. Katsaros, and Y. Manolopoulos, “Cooperative Caching in Wireless Multimedia Sensor Networks” ACM Mobile Networks and Applications, (under second round review), February, 2008.
Competing methods• [CPM community identification method] G. Palla, I.Derenyi,
I.Farkas, and T.Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.
• [Hybrid cooperative caching method] L. Yin and G. Cao. Supporting cooperative caching in ad hoc networks. IEEE Transactions on Mobile Computing, 5(1):77–89, 2006.