DDAY2014 - Edgesense: Social network analysis per tutti

39
Edgesense Social network analysis per tutti Luca Mearelli - @lmea

description

Speaker: Luca Mearelli Area: Building, Development Ogni conversazione ha una struttura. La rete formata dalle persone che iteragiscono nelle conversazioni di una comunità online può quindi essere analizzata con gli strumenti che la scienza delle reti per comprenderne le caratteristiche. Disponibile video con una demo del software che lo speaker ha mostrato durante la presentazione: https://www.youtube.com/watch?v=HqDRcSSo6bY

Transcript of DDAY2014 - Edgesense: Social network analysis per tutti

Page 1: DDAY2014 - Edgesense: Social network analysis per tutti

EdgesenseSocial network analysis per tutti

Luca Mearelli - @lmea

Page 2: DDAY2014 - Edgesense: Social network analysis per tutti

Hi, I’m Luca

Page 3: DDAY2014 - Edgesense: Social network analysis per tutti

Collective Intelligence

Page 4: DDAY2014 - Edgesense: Social network analysis per tutti

Emergence

larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties

Page 5: DDAY2014 - Edgesense: Social network analysis per tutti

Online collaboration

it works!

Page 6: DDAY2014 - Edgesense: Social network analysis per tutti

Online communities

• Exhibit emergence

• Strong design properties

•Hackable

Page 7: DDAY2014 - Edgesense: Social network analysis per tutti

The Blueprint

•Map the community social network

•Measure the structural properties

• Visualize the structure & the metrics

• Tweak the interaction

Page 8: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense

Page 9: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense Architecture HTML5 Javascript

JSON files

Python

JSON source

Page 10: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense Source Data

• users.json

• nodes.json

• comments.json

Page 11: DDAY2014 - Edgesense: Social network analysis per tutti

users.json

Page 12: DDAY2014 - Edgesense: Social network analysis per tutti

nodes.json

Page 13: DDAY2014 - Edgesense: Social network analysis per tutti

comments.json

Page 14: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense Backend

• Python

•NetworkX

Page 15: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense Parsing Pipeline

• Parse source JSON files

• Build network from interactions

• Extract metrics

• Export network + metrics to JSON files

Page 16: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

• Persons are nodes

Page 17: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

•Comments make links

Page 18: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

• Edges are aggregated

•Metadata is added

Page 19: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

def extract_edges(nodes_map, comments_map): # build the list of edges edges_list = [] # a comment is 'valid' if it has a recipient and an author valid_comments = [e for e in comments_map.values() if e.get('recipient_id', None) and e.get('author_id', None)] logging.info("%(v)i valid comments on %(t)i total" % {'v':len(valid_comments), 't':len(comments_map.values())}) # build the whole network to use for metrics for comment in valid_comments: link = { 'id': "{0}_{1}_{2}".format(comment['author_id'],comment['recipient_id'],comment['created_ts']), 'source': comment['author_id'], 'target': comment['recipient_id'], 'ts': comment['created_ts'], 'effort': comment['length'], 'team': comment['team'] } if nodes_map.has_key(comment['author_id']): nodes_map[comment['author_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['author_id']}) if nodes_map.has_key(comment['recipient_id']): nodes_map[comment['recipient_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['recipient_id']}) edges_list.append(link)

return sorted(edges_list, key=eu.sort_by('ts'))

Page 20: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

def build_network(network): MDG=nx.MultiDiGraph()

for node in network['nodes']: MDG.add_node(node['id'], node)

for edge in network['edges']: MDG.add_edge(edge['source'], edge['target'], attr_dict=edge) set_isolated(network['nodes'], MDG) return MDG

Page 21: DDAY2014 - Edgesense: Social network analysis per tutti

Network construction

def extract_dpsg(mdg, ts, team=True): dg=nx.DiGraph() # add all the nodes present at the time ts for node in mdg.nodes_iter(): if mdg.node[node]['created_ts'] <= ts and (team or not mdg.node[node]['team']): dg.add_node(node, mdg.node[node]) for node in mdg.nodes_iter(): for neighbour in mdg[node].keys(): count = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) effort = sum(e['effort'] for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) team_edge = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and e['team'])>0 if count > 0 and (team or not team_edge): dg.add_edge(node, neighbour, {'source': node, 'target': neighbour, 'effort': effort, 'count': count, 'team': team_edge}) return dg

Page 22: DDAY2014 - Edgesense: Social network analysis per tutti

•Content metrics

•Network metrics

Page 23: DDAY2014 - Edgesense: Social network analysis per tutti

•Number of users (active/inactive)

•Number of connections

•Number of community contributions

Page 24: DDAY2014 - Edgesense: Social network analysis per tutti

•Degree

•Distance

•Centrality

•Modularity

Page 25: DDAY2014 - Edgesense: Social network analysis per tutti

Network Metrics: Degree

•Number of inbound / outbound edges insisting on a node

Page 26: DDAY2014 - Edgesense: Social network analysis per tutti

Network Metrics: Distance

• The average number of hops needed to go from a randomly chosen node to another.

• A lower distance implies that information spreads more easily across the network.

Page 27: DDAY2014 - Edgesense: Social network analysis per tutti

Network Metrics: Centrality

• Refers to indicators which identify the most important vertices within a graph

• Betweenness Centrality: it is equal to the number of shortest paths from all vertices to all others that pass through that node.

Page 28: DDAY2014 - Edgesense: Social network analysis per tutti

Network Metrics: Modularity

• The difference between the observed network and a random one with the same degree distribution, on a 0-1 scale.

• Subcommunities are defined such that its members are more connected to each other than to

Page 29: DDAY2014 - Edgesense: Social network analysis per tutti

Network Metricsdef extract_network_metrics(mdg, ts, team=True): met = {} dsg = extract_dpsg(mdg, ts, team) if team : pre = 'full:' else: pre = 'user:' # avoid trying to compute metrics for # the case of empty networks if dsg.number_of_nodes()==0: return met met[pre+'nodes_count'] = dsg.number_of_nodes() met[pre+'edges_count'] = dsg.number_of_edges() met[pre+'density'] = nx.density(dsg) met[pre+'betweenness'] = nx.betweenness_centrality(dsg) met[pre+'avg_betweenness'] = float(sum(met[pre+'betweenness'].values()))/float(len(met[pre+'betweenness'].values())) met[pre+'betweenness_count'] = nx.betweenness_centrality(dsg, weight='count') met[pre+'avg_betweenness_count'] = float(sum(met[pre+'betweenness_count'].values()))/float(len(met[pre+'betweenness_count'].values())) met[pre+'betweenness_effort'] = nx.betweenness_centrality(dsg, weight='effort') met[pre+'avg_betweenness_effort'] = float(sum(met[pre+'betweenness_effort'].values()))/float(len(met[pre+'betweenness_effort'].values())) met[pre+'in_degree'] = dsg.in_degree() met[pre+'avg_in_degree'] = float(sum(met[pre+'in_degree'].values()))/float(len(met[pre+'in_degree'].values())) met[pre+'out_degree'] = dsg.out_degree() met[pre+'avg_out_degree'] = float(sum(met[pre+'out_degree'].values()))/float(len(met[pre+'out_degree'].values())) met[pre+'degree'] = dsg.degree() met[pre+'avg_degree'] = float(sum(met[pre+'degree'].values()))/float(len(met[pre+'degree'].values())) met[pre+'degree_count'] = dsg.degree(weight='count') met[pre+'avg_degree_count'] = float(sum(met[pre+'degree_count'].values()))/float(len(met[pre+'degree_count'].values())) met[pre+'degree_effort'] = dsg.degree(weight='effort') met[pre+'avg_degree_effort'] = float(sum(met[pre+'degree_effort'].values()))/float(len(met[pre+'degree_effort'].values()))

Page 30: DDAY2014 - Edgesense: Social network analysis per tutti

Exported Format{ "edges": [ { "effort": 4, "id": "2_1_1315491000", "source": "2", "target": "1", "team": false, "ts": 1315491000 }, ... ], "meta": { "generated": 1415788633 }, "metrics": [ { "ts": 1315491000, ... } ], "nodes": [ { "active": true, "created_on": "2011-09-08", "created_ts": 1315483000, "id": "1", "isolated": false, "name": "Alice", "team": true, "team_on": "2011-09-08", "team_ts": 1315483000 }, {...} ]}

Page 31: DDAY2014 - Edgesense: Social network analysis per tutti

Edgesense Frontend

• Single page application

•D3.js

• Sigma.js

Page 32: DDAY2014 - Edgesense: Social network analysis per tutti

Demo!

Page 33: DDAY2014 - Edgesense: Social network analysis per tutti

Dashboard: Network

•Uses sigma.js

• ForceAtlas layout *

•Contextual information

Page 34: DDAY2014 - Edgesense: Social network analysis per tutti

Dashboard: Metrics

• Sidebar, Bottom widgets

•Declaratively select metrics to display

<div class="small-box bg-maroon big-metric metric helped" data-metric-name="louvain_modularity" data-metric-round="3" data-help="modularity" > <div class="inner"> <h3 class="value"> </h3> <p> Modularity </p> </div> <div class="minichart"> </div></div>

Page 35: DDAY2014 - Edgesense: Social network analysis per tutti

Dashboard: Filters

Page 36: DDAY2014 - Edgesense: Social network analysis per tutti

Extras

• Twitter parser

•Gexf exporting

Page 37: DDAY2014 - Edgesense: Social network analysis per tutti

Drupal!

• Module to embed Edgesense

• Configurator for the backend processing

• Configurator for the dashboard

Page 38: DDAY2014 - Edgesense: Social network analysis per tutti

Thank you!P.S. Edgesense is opensource:

github.com/Wikitalia/edgesense