DDAY2014 - Edgesense: Social network analysis per tutti
-
Upload
drupalday -
Category
Technology
-
view
206 -
download
0
description
Transcript of DDAY2014 - Edgesense: Social network analysis per tutti
EdgesenseSocial network analysis per tutti
Luca Mearelli - @lmea
Hi, I’m Luca
Collective Intelligence
Emergence
larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties
Online collaboration
it works!
Online communities
• Exhibit emergence
• Strong design properties
•Hackable
The Blueprint
•Map the community social network
•Measure the structural properties
• Visualize the structure & the metrics
• Tweak the interaction
Edgesense
Edgesense Architecture HTML5 Javascript
JSON files
Python
JSON source
Edgesense Source Data
• users.json
• nodes.json
• comments.json
users.json
nodes.json
comments.json
Edgesense Backend
• Python
•NetworkX
Edgesense Parsing Pipeline
• Parse source JSON files
• Build network from interactions
• Extract metrics
• Export network + metrics to JSON files
Network construction
• Persons are nodes
Network construction
•Comments make links
Network construction
• Edges are aggregated
•Metadata is added
Network construction
def extract_edges(nodes_map, comments_map): # build the list of edges edges_list = [] # a comment is 'valid' if it has a recipient and an author valid_comments = [e for e in comments_map.values() if e.get('recipient_id', None) and e.get('author_id', None)] logging.info("%(v)i valid comments on %(t)i total" % {'v':len(valid_comments), 't':len(comments_map.values())}) # build the whole network to use for metrics for comment in valid_comments: link = { 'id': "{0}_{1}_{2}".format(comment['author_id'],comment['recipient_id'],comment['created_ts']), 'source': comment['author_id'], 'target': comment['recipient_id'], 'ts': comment['created_ts'], 'effort': comment['length'], 'team': comment['team'] } if nodes_map.has_key(comment['author_id']): nodes_map[comment['author_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['author_id']}) if nodes_map.has_key(comment['recipient_id']): nodes_map[comment['recipient_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['recipient_id']}) edges_list.append(link)
return sorted(edges_list, key=eu.sort_by('ts'))
Network construction
def build_network(network): MDG=nx.MultiDiGraph()
for node in network['nodes']: MDG.add_node(node['id'], node)
for edge in network['edges']: MDG.add_edge(edge['source'], edge['target'], attr_dict=edge) set_isolated(network['nodes'], MDG) return MDG
Network construction
def extract_dpsg(mdg, ts, team=True): dg=nx.DiGraph() # add all the nodes present at the time ts for node in mdg.nodes_iter(): if mdg.node[node]['created_ts'] <= ts and (team or not mdg.node[node]['team']): dg.add_node(node, mdg.node[node]) for node in mdg.nodes_iter(): for neighbour in mdg[node].keys(): count = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) effort = sum(e['effort'] for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) team_edge = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and e['team'])>0 if count > 0 and (team or not team_edge): dg.add_edge(node, neighbour, {'source': node, 'target': neighbour, 'effort': effort, 'count': count, 'team': team_edge}) return dg
•Content metrics
•Network metrics
•Number of users (active/inactive)
•Number of connections
•Number of community contributions
•Degree
•Distance
•Centrality
•Modularity
Network Metrics: Degree
•Number of inbound / outbound edges insisting on a node
Network Metrics: Distance
• The average number of hops needed to go from a randomly chosen node to another.
• A lower distance implies that information spreads more easily across the network.
Network Metrics: Centrality
• Refers to indicators which identify the most important vertices within a graph
• Betweenness Centrality: it is equal to the number of shortest paths from all vertices to all others that pass through that node.
Network Metrics: Modularity
• The difference between the observed network and a random one with the same degree distribution, on a 0-1 scale.
• Subcommunities are defined such that its members are more connected to each other than to
Network Metricsdef extract_network_metrics(mdg, ts, team=True): met = {} dsg = extract_dpsg(mdg, ts, team) if team : pre = 'full:' else: pre = 'user:' # avoid trying to compute metrics for # the case of empty networks if dsg.number_of_nodes()==0: return met met[pre+'nodes_count'] = dsg.number_of_nodes() met[pre+'edges_count'] = dsg.number_of_edges() met[pre+'density'] = nx.density(dsg) met[pre+'betweenness'] = nx.betweenness_centrality(dsg) met[pre+'avg_betweenness'] = float(sum(met[pre+'betweenness'].values()))/float(len(met[pre+'betweenness'].values())) met[pre+'betweenness_count'] = nx.betweenness_centrality(dsg, weight='count') met[pre+'avg_betweenness_count'] = float(sum(met[pre+'betweenness_count'].values()))/float(len(met[pre+'betweenness_count'].values())) met[pre+'betweenness_effort'] = nx.betweenness_centrality(dsg, weight='effort') met[pre+'avg_betweenness_effort'] = float(sum(met[pre+'betweenness_effort'].values()))/float(len(met[pre+'betweenness_effort'].values())) met[pre+'in_degree'] = dsg.in_degree() met[pre+'avg_in_degree'] = float(sum(met[pre+'in_degree'].values()))/float(len(met[pre+'in_degree'].values())) met[pre+'out_degree'] = dsg.out_degree() met[pre+'avg_out_degree'] = float(sum(met[pre+'out_degree'].values()))/float(len(met[pre+'out_degree'].values())) met[pre+'degree'] = dsg.degree() met[pre+'avg_degree'] = float(sum(met[pre+'degree'].values()))/float(len(met[pre+'degree'].values())) met[pre+'degree_count'] = dsg.degree(weight='count') met[pre+'avg_degree_count'] = float(sum(met[pre+'degree_count'].values()))/float(len(met[pre+'degree_count'].values())) met[pre+'degree_effort'] = dsg.degree(weight='effort') met[pre+'avg_degree_effort'] = float(sum(met[pre+'degree_effort'].values()))/float(len(met[pre+'degree_effort'].values()))
Exported Format{ "edges": [ { "effort": 4, "id": "2_1_1315491000", "source": "2", "target": "1", "team": false, "ts": 1315491000 }, ... ], "meta": { "generated": 1415788633 }, "metrics": [ { "ts": 1315491000, ... } ], "nodes": [ { "active": true, "created_on": "2011-09-08", "created_ts": 1315483000, "id": "1", "isolated": false, "name": "Alice", "team": true, "team_on": "2011-09-08", "team_ts": 1315483000 }, {...} ]}
Edgesense Frontend
• Single page application
•D3.js
• Sigma.js
Demo!
Dashboard: Network
•Uses sigma.js
• ForceAtlas layout *
•Contextual information
Dashboard: Metrics
• Sidebar, Bottom widgets
•Declaratively select metrics to display
<div class="small-box bg-maroon big-metric metric helped" data-metric-name="louvain_modularity" data-metric-round="3" data-help="modularity" > <div class="inner"> <h3 class="value"> </h3> <p> Modularity </p> </div> <div class="minichart"> </div></div>
Dashboard: Filters
Extras
• Twitter parser
•Gexf exporting
Drupal!
• Module to embed Edgesense
• Configurator for the backend processing
• Configurator for the dashboard
Thank you!P.S. Edgesense is opensource:
github.com/Wikitalia/edgesense
Photo credits
https://www.flickr.com/photos/swedish_heritage_board/14141937687/https://www.flickr.com/photos/nationaalarchief/5453358304/https://www.flickr.com/photos/ul_digital_library/10922274335/https://www.flickr.com/photos/texasstatearchives/9077251415/https://www.flickr.com/photos/nasacommons/9465040235/https://www.flickr.com/photos/nasacommons/9467807836