Visualizing Relationships: Journalistic Problems in a Digital Age

15
Visualizing Relationships Journalistic problems in a digital age

description

A presentation from Marcos Vanetta, Technical Lead and web developer at 3Pillar Global, and Mariano Blejman of Spanish-language newspaper Pagina 12 that was given at the 2012 Mozilla Festival in London, England. http://goo.gl/jQzTj

Transcript of Visualizing Relationships: Journalistic Problems in a Digital Age

Visualizing RelationshipsJournalistic problems in a digital age

Summary

1. Introduction2. The problem we are solving3. Involved issues4. Problems we found5. The Challenge

Who are we?

Mariano Blejman is a technology editor and youth editor in Argentine newspaper Página/12, and Hacks/Hackers Buenos Aires co-founder. @blejman

Marcos Vanetta is a biomedical engineer. Software developer at 3PillarGlobal and hacker at Hacks/Hackers Buenos Aires. @malev

Hacks/Hackers Buenos Aires

The problem

● 1976 A dictatorship started in Argentina.● 30,000 persons were kidnapped and

disappeared.● 1985 First trials happened in Argentina.

They judged the bad guys but we have to stop.

● 2003 Justice start judging the bad guys again.

● 2012 Large amount of judicial documents.

No one can read all of them

Involved issues

● Semantic Analytics● Ontology● Data Mining● Social Network Analysis● Visualizations

Who were dealing with documents?DocumentCloud,

Overview, Open Calais, NLTK, Gate

First approach

Read all the documentsSoftware solution based on regular expressionsRuby, Padrino and MySQL database

def self.extract_plain_text(path) basename = File.basename(path).split('.')[0..-2].join('.') tmp_dir = Dir.tmpdir Docsplit.extract_text(path, :output => tmp_dir, :ocr => false) text = File.open(File.join(tmp_dir, "#{basename}.txt")).read self.clean_text(text)end

The problems we found

● Convert text from pdf files● Extract entities from documents● Parse dates and addresses

● Co-reference names resolution● How to store relations● Documents contextual information● Confidence on data on a crowdsourcing platform.

Visualizing relationships over the time

What do we have now?

Prototype for a single (and local) use case:mapa76

Platform for different use cases:analice.me

The visualizations that we imagined

Visualizations that we found

The #mozfest challenge

Find a big journalistic issue that involves:● Lot of documents with unstructured data● Lof of data to find inside● What relationships do you wants to find

The #mozfest challenge

Propose at least one new visualization to find relationships (could be maps, timelines, network graphs, treemaps, bars and anything you can imagine).

We want a poster!We want post-its!We want you (to work for us)