IT tools for statistics, visualization, open data

19
1 Twinning Project “Improving data quality in public accounts” EU TWINNING PROJECT TR 08 IB FI 02 “ Improving Data Quality in Public Accounts” AB EŞLEŞTİRME PROJESİ “ Kamu Hesaplarında Veri Kalitesinin Artırılması” IT tools for statistics, visualization, open data Carlo Vaccari (ISTAT / Formez)

description

Seminar "Opening Financial Data in Turkey: transparency, accessibility and citizen involvement" "IT tools for statistics, visualization, open data" Carlo Vaccari Ankara, April 19 2012

Transcript of IT tools for statistics, visualization, open data

Page 1: IT tools for statistics, visualization, open data

1 Twinning Project “Improving data quality in public accounts”

EU TWINNING PROJECT TR 08 IB FI 02 “ Improving Data Quality in Public Accounts”

AB EŞLEŞTİRME PROJESİ “ Kamu Hesaplarında Veri Kalitesinin Artırılması”

IT tools for statistics, visualization, open data

Carlo Vaccari (ISTAT / Formez)

Page 2: IT tools for statistics, visualization, open data

2 Twinning Project “Improving data quality in public accounts”

2

Data warehouse

Business Intelligence to analyze data Business Intelligence elaborations operate on Data Warehouse

A Data Warehouse is a collection of data that supports decision making and having the following characteristics:

• oriented to the subject of interest

• integrated and consistent

• representative of the temporal evolution

• non-volatile

Page 3: IT tools for statistics, visualization, open data

3 Twinning Project “Improving data quality in public accounts” 3

Data Warehouse

Operational

data

Data Warehouse tools

Dashboards

Advanced

Reporting

Data Mining

OLAP tools

Current transactional

procedures

from operational data

to data warehouse

Page 4: IT tools for statistics, visualization, open data

4 Twinning Project “Improving data quality in public accounts”

4

Dashboard

Dashboard: data visualization tool that displays the current status of

metrics and key performance indicators (KPIs) for an enterprise.

Dashboards consolidate and arrange numbers, metrics and

sometimes performance scorecards on a single screen.

Various kind of dashboards:

“Business Dashboards” – Business related dashboard

“Executive Dashboard” – Dashboards meant to be used by CEO,

Managers etc

“Operational Dashboard” – Dashboards that monitor day to day

activity

Dashboards are designed to help us monitor what’s going on at a

glance

Page 5: IT tools for statistics, visualization, open data

5 Twinning Project “Improving data quality in public accounts”

5

Dashboard

Page 6: IT tools for statistics, visualization, open data

6 Twinning Project “Improving data quality in public accounts”

6

Dashboard

Page 7: IT tools for statistics, visualization, open data

7 Twinning Project “Improving data quality in public accounts”

7

OLAP

OnLine Analytical Processing: decision support software that allows the

user to quickly analyze information that has been summarized into

multidimensional views and hierarchies

OLAP tools are used to perform trend analysis on financial information

Multidimensional data

Many operators

Complex not-

predefined analysis

Data:

- not operational

- current and historical

Page 8: IT tools for statistics, visualization, open data

8 Twinning Project “Improving data quality in public accounts”

8 8

DFM Fact outline

Development

Analysis

DFM Functionality outline

Anagrafica Anomalia

Codice Anomalia

Anagrafica Voce Cassa

Codice Voce Cassa

Categoria

Dettaglio Anomalia

Progressivo Anomalia

Ente

Codice Ente

Fascia di Dimensione

Codice Fascia

Periodo Prospetto Cassa

Periodo Prospetto Cassa

Popolazione

Anno rilevazione Istat

Prospetto Cassa

Sezione

Tipo Anomalia

Codice Tipo Anomalia

Tipo Ente

Codice Tipo Ente

Tipo Prospetto Cassa

Tipo Prospetto Cassa

Tipo Voce Cassa

Codice Tipo Voce Cassa

Titolo

Voce Cassa

Voce Dettaglio

Anagrafica Voce PSI

Codice Voce Patto

Ente

Codice Ente

Fascia di Dimensione

Codice Fascia

Periodo Prospetto PSI

Periodo PSI

Popolazione

Anno rilevazione Istat

Prospetto PSI

Tipo Applicazione

Codice Tipo Applicazione

Tipo Ente

Codice Tipo Ente

Tipo Modello

Codice Tipo Modello

Tipo Voce Prospetto

Codice Tipo Voce

Voce Istat Voce Patto

Voce Prospetto PSI

E/R outline

DMA_DC00_TEMPO

ID_TEMPO: SMALLINT

ANNO: SMALLINT

MESE: TINYINT

GIORNO: TINYINT

DATA: smalldatetime

ORA: TINYINT

FESTIVO: bit

DMA_DC01_DATA_OSSERVAZIONE

ID_DATA_OSS: smalldatetime

DATA_OSSERVAZIONE: smalldatetime

DMA_DC02_CLIENTI

ID_CLIENTE: SMALLINT

TIPO_CLIENTE: varchar(15)

CONSENSO_INFORM: varchar(2)

NOMINATIVO_DA_RICHIAMARE: varchar(200)

NOMINATIVO_INTERLOCUTORE: varchar(200)

E_MAIL: varchar(50)

PARTITA_IVA: varchar(16)

FORMA_GIURIDICA: varchar(200)

COGNOME_RAGIONE_SOCIALE: varchar(200)

NOME: varchar(100)

TITOLO: varchar(50)

SESSO: varchar(10)

UTENZA_CLI_INPUT: varchar(15)

UTENZA_ALTERNATIVA1_CLI_OUTPUT: varchar(15)

TIPO_1_CLI_OUTPUT: varchar(20)

UTENZA_ALTERNATIVA2_CLI_OUTPUT: varchar(15)

CENTRALE_TELEFONICA: varchar(100)

CODICE_IDBRE: varchar(15)

LONGDISTANCE: varchar(2)

COPERTURA_WS: varchar(2)

CENTRALE_ADSL_SA: varchar(2)

TIPO_APPARATO: varchar(50)

INDIRIZZO_DEL_CLI_INPUT: varchar(200)

COMUNE: varchar(100)

PROVINCIA: varchar(100)

REGIONE: varchar(50)

CAP: varchar(5)

NUM_LINEE: TINYINT

FAX: varchar(15)

COD_CLIENTE: varchar(20)

COD_REGIONE: varchar(2)

COD_PROVINCIA: varchar(3)

COD_COMUNE: varchar(3)

DMA_DC03_ESITO_CONTATTO

ID_ESITO: SMALLINT

UTILITA: varchar(10)

ESITO: varchar(20)

MOTIVO_ESITO: varchar(80)

COD_ESITO_DEF: varchar(2)

ESITO_DEFINITIVO: varchar(20)

DMA_DC07_FASCIA_ETA

ID_FASCIA_ETA: TINYINT

FASCIA_ETA: varchar(10)

ESTREMO_INF: TINYINT

ESTREMO_SUP: TINYINT

DMA_DC08_OPERATORI_TELESELLING

ID_OPERATORE: int

OPERATORE: varchar(20)

PARTNER_COMMERCIALE: varchar(255)

DMA_DC09_CONTRATTI

ID_CONTRATTO: SMALLINT

TIPO_CONTRATTO: varchar(30)

CONTRATTO: varchar(200)

DMA_DC12_CAMPAGNE

ID_CAMPAGNA: SMALLINT

CAMPAGNA: varchar(200)

DATA_ASSEGN_CAMPAGNA: smalldatetime

DB_PROVENIENZA: varchar(100)

DMA_DC13_LISTE

ID_LISTA: SMALLINT

DENOMINAZIONE_LISTA: varchar(200)

NOME_FORNITORE: varchar(200)

DATA_FORNITURA: smalldatetime

CRITERI_SELEZIONE: varchar(2000)

COD_LISTA: varchar(200)

DMA_DC14_TERRITORIO

COD_PROVINCIA: varchar(3)

COD_COMUNE: varchar(3)

COMUNE: varchar(100)

PROVINCIA: varchar(100)

REGIONE: varchar(100)

RIPARTIZIONE: varchar(20)

CODICE_REGIONE: varchar(2)

CODICE_RIPART: varchar(1)

DMA_DC15_GESTORI

ID_GESTORE: SMALLINT

GESTORE: varchar(100)

DMA_FC01_CONTATTI

ID_CAMPAGNA: SMALLINT

ID_LISTA: SMALLINT

ID_CLIENTE: SMALLINT

NUM_CONTATTI_DEFINITIVI: SMALLINT

NUM_CONTATTI_NON_DEFINITIVI: SMALLINT

NUM__PRODOTTI_VENDUTI: SMALLINT

NUM_SERVIZI_VENDUTI: SMALLINT

ID_TEMPO_COURTESY_CALL: SMALLINT

ID_TEMPO_CHIUSURA: SMALLINT

ID_TEMPO_SCADENZA_GEST: SMALLINT

ID_CONTRATTO: SMALLINT

ID_ESITO: SMALLINT

ID_DATA_OSS: smalldatetime

ID_GESTORE: SMALLINT

ID_FASCIA_ETA: TINYINT

DATA_HHMM_CONTATTO: smalldatetime

ID_MOT_NON_ADES_ALTR: TINYINT

ID_MOT_NONADES_NOTE: int

ID_ESITO_CCALL: SMALLINT

COD_PROVINCIA: varchar(3)

COD_COMUNE: varchar(3)

COD_REGIONE: varchar(2)

ID_OPERATORE_VENDITA: int

ID_OPERATORE_CCALL: int

ID_MOT_RIFIUTO_NOTE: int

DMA_DC06_MOT_NONADES_ALTRO

ID_MOT_NON_ADES_ALTR: TINYINT

MOTIVO_NON_ADESIONE_ALTR: varchar(50)

DMA_DC10_MOT_NONADES_NOTE

ID_MOT_NONADES_NOTE: int

MOT_NONADES_NOTE: varchar(2000)

DMA_DC04_ESITO_CCALL

ID_ESITO_CCALL: SMALLINT

ESITO_COURTESY_CALL: varchar(20)

MOTIVO_RIFIUTO: varchar(80)

COD_ESITO_CCAL: varchar(10)

DMA_DC16_PROVINCE

COD_PROVINCIA: varchar(3)

PROVINCIA: varchar(100)

COD_REGIONE: varchar(2)

DMA_DC18_REGIONI

COD_REGIONE: varchar(2)

REGIONE: varchar(50)

RIPARTIZIONE: varchar(30)

COD_RIPARTIZ: varchar(1)

DMA_DC05_MOT_RIFIUTO_NOTE

ID_MOT_RIFIUTO_NOTE: int

MOTIVO_RIFIUTO_NOTE: varchar(2000)

D_DMA_DAEN_ANAGRAFICA_ENTE

D_DMA_DBTE_BANCA_TESORIERA

D_DMA_DCGS_CODICI_GEST_SIOPE

D_DMA_DPRZ_PROV_REG_ZONA

D_DMA_DTPE_TIPO_ ENTE

D_DMA_FMUE_MOV_USCITE_ENTRATE

D_DMA_SLOG_LOG_DI_CARICAMENTO

D_DMA_SSTS_STATUS

D_DMA_DCEE_CL_ECONOMICA_EN

D_DMA_DDCT_DATA_CONT_SIOPE

D_DMA_DSTE_SOTTOTIPO_ENTE

D_DMA_DCES_CL_ECONOMICA_SP

D_DMA_DAVC_ANAGRAFICA_VOCE_CAS

Data Mart

EDW ETL ETL

BO Universe BO Report

Business Rules

An OLAP implementation with BO (Italian case)

Page 9: IT tools for statistics, visualization, open data

9 Twinning Project “Improving data quality in public accounts”

9

Tools

Examples of tools for data management and Business Intelligence (opensource applications) Google refine http://code.google.com/p/google-refine/ Business Intelligence opensource tools: - http://www.pentaho.com/ - http://www.jaspersoft.com/ - http://www.palo.net/ Free software with fee for support

Page 10: IT tools for statistics, visualization, open data

10 Twinning Project “Improving data quality in public accounts”

10

Visualization techniques

Visualization techniques (cartography, advanced visualization tools) Mindmaps Displaying news Displaying data Displaying connections Displaying websites Articles & resources Tools and services

Tableau http://www.tableausoftware.com/public/community CACS http://www.cacs.org/post/index

Page 11: IT tools for statistics, visualization, open data

11 Twinning Project “Improving data quality in public accounts”

11

Visualization techniques

Infographics http://pinterest.com/rtkrum/cool-infographics-gallery/ Visualization tools: http://datavisualization.ch/tools/ IBM Many Eyes http://www-958.ibm.com/software/data/cognos/manyeyes/ Even on social network http://tweettopicexplorer.neoformix.com/#n=internazionale New visualizations: http://hint.fm/wind/

Page 12: IT tools for statistics, visualization, open data

12 Twinning Project “Improving data quality in public accounts”

12

Visualization techniques

Google Public Data Explorer: a simple way to start presenting data using advanced visualization techniques is Google Public Data Explorer (http://www.google.com/publicdata/home), a tool by which every organization can show his data on the Web so that users can find, explore, and share it. Two steps available for using GPDE: 1 - MoF can start testing the tool uploading datasets for visualization and exploration by privileged users 2 – in a second phase MoF can agree with Google for a formal insertion of his data in the Dataset Directory http://www.google.com/publicdata/directory Many organizations have chosen this way of publishing (often as additional way to their website), between them WorldBank, IMF, OECD, Eurostat etc.

Page 13: IT tools for statistics, visualization, open data

13 Twinning Project “Improving data quality in public accounts”

13

Visualization techniques

VIDI VIDI suite is a set of Drupal (an open CMS) modules designed to enable the creation of visual data displays. Using VIDI tools you can display changes in data values over time, relate data in various ways to geographical maps, or display static datasets through different types of charts. You can use Dataviz website to create visual data displays Two ways to use it: 1. You can use VIDI on the website http://www.dataviz.org loading your data, choosing between available visualizations and storing your visualization 2. Download VIDI modules and install them in your Drupal webiste, then import your datasets and prepare data displays see http://www.patchworknation.org/

Page 14: IT tools for statistics, visualization, open data

14 Twinning Project “Improving data quality in public accounts”

14

Visualization techniques

Future: HTML5 – new standard for the Web See some show: http://www.apple.com/html5/ Some effect from http://slides.html5rocks.com

Page 15: IT tools for statistics, visualization, open data

15 Twinning Project “Improving data quality in public accounts”

15

Open Data

Open Data: - data freely available to everyone - elementary (raw) data - to use and republish - without restrictions from copyrights or patents Best practice: World Bank http://data.worldbank.org/ Data: by country or by topic or by indicator (1000+) All indicators available in table, map, graph and downloadable as xls and xml On the WB website also modules to directly access WB data from Stata and “R” statistical tools

Page 16: IT tools for statistics, visualization, open data

16 Twinning Project “Improving data quality in public accounts”

16

Open Data

Tim Berners-Lee: Linked Data associated

with gold stars, like the ones you got in

school.

1 - make your stuff available on the web

(whatever format)

2 - make it available as structured data

(e.g. excel instead of image scan)

3 - non-proprietary format (e.g. csv not xls)

4 - use URLs to identify things, so that

people can point at your stuff

5 - link your data to other people’s data to

provide context

Page 17: IT tools for statistics, visualization, open data

17 Twinning Project “Improving data quality in public accounts”

17

Linked Data

Page 18: IT tools for statistics, visualization, open data

18 Twinning Project “Improving data quality in public accounts”

18

Open Data tools: CKAN

CKAN stands for Comprehensive Knowledge Archive Network Developed by OKFN Open Knowledge Foundation Network Open source package that make data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available Used by many central (dk, no, uk) and local governments Features: Publish & Find Datasets (import, keywords, versioning) Store & Manage Data (Raw data, metadata, statistics, geo-) Engage with users & Others (Community Mgmt) Customize & Extend (APIs, extensions, opensource)

Page 19: IT tools for statistics, visualization, open data

19 Twinning Project “Improving data quality in public accounts”

19

Open Data tools: data.gov / Drupal

Drupal is a CMS (Content Management System) opensource often used in Open Data projects Data.gov code released as OSS (a modified Drupal version) used also for India → Open Government Platform Drupal OpenData working group http://groups.drupal.org/opendata-working-group Data Journalism: http://www.guardian.co.uk/world/datablog/2010/feb/01/united-nations-population-world-data