Mapping Commodity Trading

17
Mapping Commodity Trading in the 19th Century Benjamin Bach, INRIA, Paris Asma Malik, University of Strathclyde, Glasgow Michael Mauderer, University of St Andrews Sadiq Sani, Robert Gordon University, Aberdeen Joe Wandy, University of Glasgow

description

Historical Trading Data by Team Ash at the Big Data InfoVis Summer School members Joe Wandy Asma Malik Michael Mauderer Sadiq Sani Benjamin Bach

Transcript of Mapping Commodity Trading

Page 1: Mapping Commodity Trading

Mapping Commodity Trading in the 19th Century

Benjamin Bach, INRIA, Paris

Asma Malik,University of Strathclyde, Glasgow

Michael Mauderer,University of St Andrews

Sadiq Sani,Robert Gordon University, Aberdeen

Joe Wandy,University of Glasgow

Page 2: Mapping Commodity Trading

Outline

● Project Overview● Data● Technology● Demo● Future Work

Page 3: Mapping Commodity Trading

Overview

19th Century

Commodities Diseases

Locations Disasters

Page 4: Mapping Commodity Trading

Process

Page 5: Mapping Commodity Trading

Tasks

● Retrieve documents mentioning ○ Commodities○ Locations○ Time range

● Relations between retrieved terms○ Spatial relations○ Temporal relations○ Co-occurrence relations

Users:Historians

Page 6: Mapping Commodity Trading

Data

● Commodities: 1067● Time: 1600 - 1952 (452 years)● Documents: 18 580● Location occurrences: 91 650 469● Commodity occurrences: 29 020 013

Page 7: Mapping Commodity Trading

The Data

● PostgreSQL Database in Edinburgh○ Not accessible

● PostgreSQL Database in St Andrews○ Low Performance

● PostgreSQL Database Backup○ 2.5GB compressed binary data○ Cannot be imported into Amazon RDS

Page 8: Mapping Commodity Trading

Solution 1

● Create a more compatible SQL export to import into Amazon RDS

○ 24GB raw text file containing SQL statements○ still incompatible○ hard to correct errors in a timely manner

Page 9: Mapping Commodity Trading

Solution 2

● Create EC2 instance running a PostgreSQL database

○ Powerful enough○ Enough storage○ Accessible

Page 10: Mapping Commodity Trading

Big Data Problems

● Simple things take a long time● Incremental finding of errors/new problems

Page 11: Mapping Commodity Trading

The Pipeline

● D3 for client-side presentation● Java+SQL for server-side processing

data

Database

Web ServiceClient

Commodities, date range

Page 12: Mapping Commodity Trading

Initial Sketches

Page 13: Mapping Commodity Trading
Page 14: Mapping Commodity Trading

Visualization

- Space and time -> Finding related terms + documents

- find related documents- what are documents talking about

- Implicit knowledge:- Co-occurrences of terms in documentsFor every commodity: 1) Get top 10 documents,2) Limit related terms to 63) Sum up co-occurrences

Page 15: Mapping Commodity Trading

Demo

Page 16: Mapping Commodity Trading

Future work

- Query by Location- Time diagrams for term frequency over time- Encode information in matrix cells (#doc,collection..)- Show and browse documents

- Handle big data: diseases, disasters, ..- Co-occurrences ?

Page 17: Mapping Commodity Trading

Thank you for listening!