Mapping Commodity Trading

Post on 10-Jun-2015

548 views 0 download

Tags:

description

Historical Trading Data by Team Ash at the Big Data InfoVis Summer School members Joe Wandy Asma Malik Michael Mauderer Sadiq Sani Benjamin Bach

Transcript of Mapping Commodity Trading

Mapping Commodity Trading in the 19th Century

Benjamin Bach, INRIA, Paris

Asma Malik,University of Strathclyde, Glasgow

Michael Mauderer,University of St Andrews

Sadiq Sani,Robert Gordon University, Aberdeen

Joe Wandy,University of Glasgow

Outline

● Project Overview● Data● Technology● Demo● Future Work

Overview

19th Century

Commodities Diseases

Locations Disasters

Process

Tasks

● Retrieve documents mentioning ○ Commodities○ Locations○ Time range

● Relations between retrieved terms○ Spatial relations○ Temporal relations○ Co-occurrence relations

Users:Historians

Data

● Commodities: 1067● Time: 1600 - 1952 (452 years)● Documents: 18 580● Location occurrences: 91 650 469● Commodity occurrences: 29 020 013

The Data

● PostgreSQL Database in Edinburgh○ Not accessible

● PostgreSQL Database in St Andrews○ Low Performance

● PostgreSQL Database Backup○ 2.5GB compressed binary data○ Cannot be imported into Amazon RDS

Solution 1

● Create a more compatible SQL export to import into Amazon RDS

○ 24GB raw text file containing SQL statements○ still incompatible○ hard to correct errors in a timely manner

Solution 2

● Create EC2 instance running a PostgreSQL database

○ Powerful enough○ Enough storage○ Accessible

Big Data Problems

● Simple things take a long time● Incremental finding of errors/new problems

The Pipeline

● D3 for client-side presentation● Java+SQL for server-side processing

data

Database

Web ServiceClient

Commodities, date range

Initial Sketches

Visualization

- Space and time -> Finding related terms + documents

- find related documents- what are documents talking about

- Implicit knowledge:- Co-occurrences of terms in documentsFor every commodity: 1) Get top 10 documents,2) Limit related terms to 63) Sum up co-occurrences

Demo

Future work

- Query by Location- Time diagrams for term frequency over time- Encode information in matrix cells (#doc,collection..)- Show and browse documents

- Handle big data: diseases, disasters, ..- Co-occurrences ?

Thank you for listening!