Post on 19-May-2015
description
1 Twinning Project “Improving data quality in public accounts”
EU TWINNING PROJECT TR 08 IB FI 02 “ Improving Data Quality in Public Accounts”
AB EŞLEŞTİRME PROJESİ “ Kamu Hesaplarında Veri Kalitesinin Artırılması”
IT tools for statistics, visualization, open data
Carlo Vaccari (ISTAT / Formez)
2 Twinning Project “Improving data quality in public accounts”
2
Data warehouse
Business Intelligence to analyze data Business Intelligence elaborations operate on Data Warehouse
A Data Warehouse is a collection of data that supports decision making and having the following characteristics:
• oriented to the subject of interest
• integrated and consistent
• representative of the temporal evolution
• non-volatile
3 Twinning Project “Improving data quality in public accounts” 3
Data Warehouse
Operational
data
Data Warehouse tools
Dashboards
Advanced
Reporting
Data Mining
OLAP tools
Current transactional
procedures
from operational data
to data warehouse
4 Twinning Project “Improving data quality in public accounts”
4
Dashboard
Dashboard: data visualization tool that displays the current status of
metrics and key performance indicators (KPIs) for an enterprise.
Dashboards consolidate and arrange numbers, metrics and
sometimes performance scorecards on a single screen.
Various kind of dashboards:
“Business Dashboards” – Business related dashboard
“Executive Dashboard” – Dashboards meant to be used by CEO,
Managers etc
“Operational Dashboard” – Dashboards that monitor day to day
activity
Dashboards are designed to help us monitor what’s going on at a
glance
5 Twinning Project “Improving data quality in public accounts”
5
Dashboard
6 Twinning Project “Improving data quality in public accounts”
6
Dashboard
7 Twinning Project “Improving data quality in public accounts”
7
OLAP
OnLine Analytical Processing: decision support software that allows the
user to quickly analyze information that has been summarized into
multidimensional views and hierarchies
OLAP tools are used to perform trend analysis on financial information
Multidimensional data
Many operators
Complex not-
predefined analysis
Data:
- not operational
- current and historical
8 Twinning Project “Improving data quality in public accounts”
8 8
DFM Fact outline
Development
Analysis
DFM Functionality outline
Anagrafica Anomalia
Codice Anomalia
Anagrafica Voce Cassa
Codice Voce Cassa
Categoria
Dettaglio Anomalia
Progressivo Anomalia
Ente
Codice Ente
Fascia di Dimensione
Codice Fascia
Periodo Prospetto Cassa
Periodo Prospetto Cassa
Popolazione
Anno rilevazione Istat
Prospetto Cassa
Sezione
Tipo Anomalia
Codice Tipo Anomalia
Tipo Ente
Codice Tipo Ente
Tipo Prospetto Cassa
Tipo Prospetto Cassa
Tipo Voce Cassa
Codice Tipo Voce Cassa
Titolo
Voce Cassa
Voce Dettaglio
Anagrafica Voce PSI
Codice Voce Patto
Ente
Codice Ente
Fascia di Dimensione
Codice Fascia
Periodo Prospetto PSI
Periodo PSI
Popolazione
Anno rilevazione Istat
Prospetto PSI
Tipo Applicazione
Codice Tipo Applicazione
Tipo Ente
Codice Tipo Ente
Tipo Modello
Codice Tipo Modello
Tipo Voce Prospetto
Codice Tipo Voce
Voce Istat Voce Patto
Voce Prospetto PSI
E/R outline
DMA_DC00_TEMPO
ID_TEMPO: SMALLINT
ANNO: SMALLINT
MESE: TINYINT
GIORNO: TINYINT
DATA: smalldatetime
ORA: TINYINT
FESTIVO: bit
DMA_DC01_DATA_OSSERVAZIONE
ID_DATA_OSS: smalldatetime
DATA_OSSERVAZIONE: smalldatetime
DMA_DC02_CLIENTI
ID_CLIENTE: SMALLINT
TIPO_CLIENTE: varchar(15)
CONSENSO_INFORM: varchar(2)
NOMINATIVO_DA_RICHIAMARE: varchar(200)
NOMINATIVO_INTERLOCUTORE: varchar(200)
E_MAIL: varchar(50)
PARTITA_IVA: varchar(16)
FORMA_GIURIDICA: varchar(200)
COGNOME_RAGIONE_SOCIALE: varchar(200)
NOME: varchar(100)
TITOLO: varchar(50)
SESSO: varchar(10)
UTENZA_CLI_INPUT: varchar(15)
UTENZA_ALTERNATIVA1_CLI_OUTPUT: varchar(15)
TIPO_1_CLI_OUTPUT: varchar(20)
UTENZA_ALTERNATIVA2_CLI_OUTPUT: varchar(15)
CENTRALE_TELEFONICA: varchar(100)
CODICE_IDBRE: varchar(15)
LONGDISTANCE: varchar(2)
COPERTURA_WS: varchar(2)
CENTRALE_ADSL_SA: varchar(2)
TIPO_APPARATO: varchar(50)
INDIRIZZO_DEL_CLI_INPUT: varchar(200)
COMUNE: varchar(100)
PROVINCIA: varchar(100)
REGIONE: varchar(50)
CAP: varchar(5)
NUM_LINEE: TINYINT
FAX: varchar(15)
COD_CLIENTE: varchar(20)
COD_REGIONE: varchar(2)
COD_PROVINCIA: varchar(3)
COD_COMUNE: varchar(3)
DMA_DC03_ESITO_CONTATTO
ID_ESITO: SMALLINT
UTILITA: varchar(10)
ESITO: varchar(20)
MOTIVO_ESITO: varchar(80)
COD_ESITO_DEF: varchar(2)
ESITO_DEFINITIVO: varchar(20)
DMA_DC07_FASCIA_ETA
ID_FASCIA_ETA: TINYINT
FASCIA_ETA: varchar(10)
ESTREMO_INF: TINYINT
ESTREMO_SUP: TINYINT
DMA_DC08_OPERATORI_TELESELLING
ID_OPERATORE: int
OPERATORE: varchar(20)
PARTNER_COMMERCIALE: varchar(255)
DMA_DC09_CONTRATTI
ID_CONTRATTO: SMALLINT
TIPO_CONTRATTO: varchar(30)
CONTRATTO: varchar(200)
DMA_DC12_CAMPAGNE
ID_CAMPAGNA: SMALLINT
CAMPAGNA: varchar(200)
DATA_ASSEGN_CAMPAGNA: smalldatetime
DB_PROVENIENZA: varchar(100)
DMA_DC13_LISTE
ID_LISTA: SMALLINT
DENOMINAZIONE_LISTA: varchar(200)
NOME_FORNITORE: varchar(200)
DATA_FORNITURA: smalldatetime
CRITERI_SELEZIONE: varchar(2000)
COD_LISTA: varchar(200)
DMA_DC14_TERRITORIO
COD_PROVINCIA: varchar(3)
COD_COMUNE: varchar(3)
COMUNE: varchar(100)
PROVINCIA: varchar(100)
REGIONE: varchar(100)
RIPARTIZIONE: varchar(20)
CODICE_REGIONE: varchar(2)
CODICE_RIPART: varchar(1)
DMA_DC15_GESTORI
ID_GESTORE: SMALLINT
GESTORE: varchar(100)
DMA_FC01_CONTATTI
ID_CAMPAGNA: SMALLINT
ID_LISTA: SMALLINT
ID_CLIENTE: SMALLINT
NUM_CONTATTI_DEFINITIVI: SMALLINT
NUM_CONTATTI_NON_DEFINITIVI: SMALLINT
NUM__PRODOTTI_VENDUTI: SMALLINT
NUM_SERVIZI_VENDUTI: SMALLINT
ID_TEMPO_COURTESY_CALL: SMALLINT
ID_TEMPO_CHIUSURA: SMALLINT
ID_TEMPO_SCADENZA_GEST: SMALLINT
ID_CONTRATTO: SMALLINT
ID_ESITO: SMALLINT
ID_DATA_OSS: smalldatetime
ID_GESTORE: SMALLINT
ID_FASCIA_ETA: TINYINT
DATA_HHMM_CONTATTO: smalldatetime
ID_MOT_NON_ADES_ALTR: TINYINT
ID_MOT_NONADES_NOTE: int
ID_ESITO_CCALL: SMALLINT
COD_PROVINCIA: varchar(3)
COD_COMUNE: varchar(3)
COD_REGIONE: varchar(2)
ID_OPERATORE_VENDITA: int
ID_OPERATORE_CCALL: int
ID_MOT_RIFIUTO_NOTE: int
DMA_DC06_MOT_NONADES_ALTRO
ID_MOT_NON_ADES_ALTR: TINYINT
MOTIVO_NON_ADESIONE_ALTR: varchar(50)
DMA_DC10_MOT_NONADES_NOTE
ID_MOT_NONADES_NOTE: int
MOT_NONADES_NOTE: varchar(2000)
DMA_DC04_ESITO_CCALL
ID_ESITO_CCALL: SMALLINT
ESITO_COURTESY_CALL: varchar(20)
MOTIVO_RIFIUTO: varchar(80)
COD_ESITO_CCAL: varchar(10)
DMA_DC16_PROVINCE
COD_PROVINCIA: varchar(3)
PROVINCIA: varchar(100)
COD_REGIONE: varchar(2)
DMA_DC18_REGIONI
COD_REGIONE: varchar(2)
REGIONE: varchar(50)
RIPARTIZIONE: varchar(30)
COD_RIPARTIZ: varchar(1)
DMA_DC05_MOT_RIFIUTO_NOTE
ID_MOT_RIFIUTO_NOTE: int
MOTIVO_RIFIUTO_NOTE: varchar(2000)
D_DMA_DAEN_ANAGRAFICA_ENTE
D_DMA_DBTE_BANCA_TESORIERA
D_DMA_DCGS_CODICI_GEST_SIOPE
D_DMA_DPRZ_PROV_REG_ZONA
D_DMA_DTPE_TIPO_ ENTE
D_DMA_FMUE_MOV_USCITE_ENTRATE
D_DMA_SLOG_LOG_DI_CARICAMENTO
D_DMA_SSTS_STATUS
D_DMA_DCEE_CL_ECONOMICA_EN
D_DMA_DDCT_DATA_CONT_SIOPE
D_DMA_DSTE_SOTTOTIPO_ENTE
D_DMA_DCES_CL_ECONOMICA_SP
D_DMA_DAVC_ANAGRAFICA_VOCE_CAS
Data Mart
EDW ETL ETL
BO Universe BO Report
Business Rules
An OLAP implementation with BO (Italian case)
9 Twinning Project “Improving data quality in public accounts”
9
Tools
Examples of tools for data management and Business Intelligence (opensource applications) Google refine http://code.google.com/p/google-refine/ Business Intelligence opensource tools: - http://www.pentaho.com/ - http://www.jaspersoft.com/ - http://www.palo.net/ Free software with fee for support
10 Twinning Project “Improving data quality in public accounts”
10
Visualization techniques
Visualization techniques (cartography, advanced visualization tools) Mindmaps Displaying news Displaying data Displaying connections Displaying websites Articles & resources Tools and services
Tableau http://www.tableausoftware.com/public/community CACS http://www.cacs.org/post/index
11 Twinning Project “Improving data quality in public accounts”
11
Visualization techniques
Infographics http://pinterest.com/rtkrum/cool-infographics-gallery/ Visualization tools: http://datavisualization.ch/tools/ IBM Many Eyes http://www-958.ibm.com/software/data/cognos/manyeyes/ Even on social network http://tweettopicexplorer.neoformix.com/#n=internazionale New visualizations: http://hint.fm/wind/
12 Twinning Project “Improving data quality in public accounts”
12
Visualization techniques
Google Public Data Explorer: a simple way to start presenting data using advanced visualization techniques is Google Public Data Explorer (http://www.google.com/publicdata/home), a tool by which every organization can show his data on the Web so that users can find, explore, and share it. Two steps available for using GPDE: 1 - MoF can start testing the tool uploading datasets for visualization and exploration by privileged users 2 – in a second phase MoF can agree with Google for a formal insertion of his data in the Dataset Directory http://www.google.com/publicdata/directory Many organizations have chosen this way of publishing (often as additional way to their website), between them WorldBank, IMF, OECD, Eurostat etc.
13 Twinning Project “Improving data quality in public accounts”
13
Visualization techniques
VIDI VIDI suite is a set of Drupal (an open CMS) modules designed to enable the creation of visual data displays. Using VIDI tools you can display changes in data values over time, relate data in various ways to geographical maps, or display static datasets through different types of charts. You can use Dataviz website to create visual data displays Two ways to use it: 1. You can use VIDI on the website http://www.dataviz.org loading your data, choosing between available visualizations and storing your visualization 2. Download VIDI modules and install them in your Drupal webiste, then import your datasets and prepare data displays see http://www.patchworknation.org/
14 Twinning Project “Improving data quality in public accounts”
14
Visualization techniques
Future: HTML5 – new standard for the Web See some show: http://www.apple.com/html5/ Some effect from http://slides.html5rocks.com
15 Twinning Project “Improving data quality in public accounts”
15
Open Data
Open Data: - data freely available to everyone - elementary (raw) data - to use and republish - without restrictions from copyrights or patents Best practice: World Bank http://data.worldbank.org/ Data: by country or by topic or by indicator (1000+) All indicators available in table, map, graph and downloadable as xls and xml On the WB website also modules to directly access WB data from Stata and “R” statistical tools
16 Twinning Project “Improving data quality in public accounts”
16
Open Data
Tim Berners-Lee: Linked Data associated
with gold stars, like the ones you got in
school.
1 - make your stuff available on the web
(whatever format)
2 - make it available as structured data
(e.g. excel instead of image scan)
3 - non-proprietary format (e.g. csv not xls)
4 - use URLs to identify things, so that
people can point at your stuff
5 - link your data to other people’s data to
provide context
17 Twinning Project “Improving data quality in public accounts”
17
Linked Data
18 Twinning Project “Improving data quality in public accounts”
18
Open Data tools: CKAN
CKAN stands for Comprehensive Knowledge Archive Network Developed by OKFN Open Knowledge Foundation Network Open source package that make data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available Used by many central (dk, no, uk) and local governments Features: Publish & Find Datasets (import, keywords, versioning) Store & Manage Data (Raw data, metadata, statistics, geo-) Engage with users & Others (Community Mgmt) Customize & Extend (APIs, extensions, opensource)
19 Twinning Project “Improving data quality in public accounts”
19
Open Data tools: data.gov / Drupal
Drupal is a CMS (Content Management System) opensource often used in Open Data projects Data.gov code released as OSS (a modified Drupal version) used also for India → Open Government Platform Drupal OpenData working group http://groups.drupal.org/opendata-working-group Data Journalism: http://www.guardian.co.uk/world/datablog/2010/feb/01/united-nations-population-world-data