Dealing with Open Data in Istat

19
Dealing with Open Data at ISTAT Giovanni A. Barbieri Statistics Italy (Istat)

Transcript of Dealing with Open Data in Istat

Page 1: Dealing with Open Data in Istat

Dealing with Open Data at ISTAT

Giovanni A. BarbieriStatistics Italy (Istat)

Page 2: Dealing with Open Data in Istat

The Open Data Movement

Our proposal is simple: […] the federal government […] is to provide data that is easy for others to reuse, rather than to help citizens use the data in one particular way or another

Open infrastructures that enable citizens to make their own uses of the data

Reverse the current policy, which is to regard government websites themselves as the primary vehicle for the distribution of public data, and open infrastructures for sharing the data as a laudable but secondary objective [Robinson,Yu, Zeller and Felten 2008]

2

Page 3: Dealing with Open Data in Istat

Crowdsourcing Government Transparency

Government information that is nominally publicly available is in fact difficult to access either because it is not online or, if it is online, because it is not available in useful and flexible formats [Brito 2008]

“Structured data” Associated structured XML file would allow a

user to sort the data by ascending or descending date, alphabetically by headline or author, by number of words, and in many other ways

3

Page 4: Dealing with Open Data in Istat

Open Data Ecosystem

“Open data are adding a new dimension to big data analytics and giving rise to novel, data-driven innovations.” (McKinsey Global Institute Report, Oct. 2013)

From Citizens IBM BLOG

Wide Range of Open Data Consumers

Citizens that would like to learn characteristics of the places they are, e.g.

with mobile apps showing location-specific features

Journalists that need to access data for updated and aware information

communication

Educators that are helped in their teaching task by access to data on

different application domains

4

Page 5: Dealing with Open Data in Istat

Official Statistics meets Open Data

Official Statistics can more easily reach such wide range of users if conveyed through open data

Recent technologies advances in the open data community enable new advanced dissemination channels for Official Statistics

5

Reinforcing trust

Getting closer to users

Reaching new users

Giving information

back

Improving metadata

Page 6: Dealing with Open Data in Istat

Linked Open Data

Semantic Web Technological Standards

OWL

Knowledge Representation

Linked Open Data - LOD

6

Page 7: Dealing with Open Data in Istat

Why is Linked Data an Opportunity?

Linked Data as a semantically rich paradigm for data representation

Rich enough for the strict requirements of Official Statistics

Formal and well-defined data structures, i.e. ontologies

Linked Data as an international standard (W3C) Tools availability and independence Beyond statistical users RDF: Resource Description Framework (W3C)

(subject-predicate-object)

7

Page 8: Dealing with Open Data in Istat

Istat’s Linked Open Data Portal - 1

Istat LOD Portal: http://datiopen.istat.itEnglish Version: http://datiopen.istat.it/index.php?language=eng

8

Page 9: Dealing with Open Data in Istat

Platform for• Selecting • Navigating • Searching • Querying • Visualizing Open Data

The platform allows• Direct access to data via Web Services • M2M solutions (e.g. GIS-LOD) • Data conversion• Export to productivity tools• Visualization by means of external tools

Istat’s Linked Open Data Portal - 2

9

Page 10: Dealing with Open Data in Istat

STEP 1 Give each class of users (human or not) the most appropriate way to use the data

STEP 3 Enrich the data with a semantic layer, regardless of the release on public web sites

STEP 2 Make data in open format, whatever the level of openness

Istat’s Linked Open Data Portal - 3

Steps to a «perfect» data portal

10

Page 11: Dealing with Open Data in Istat

Istat’s Linked Open Data Portal - 4

Guided Access

Freedom of access

Type

of i

nter

actio

n

Free AccessHum

an

basic

Mac

hine

To

Mac

hine

Navigation

Guided queries

Query REST onSPARQL EndPoint

Query via SPARQL EndPoint

Web Service

Download

Hum

an

Adva

nced

11

Page 12: Dealing with Open Data in Istat

Predefined Queries(Set of simple and

customizable queries)

Free Queries(SPARQL Queries)

Navigation

Guided Queries

Download

Type

of i

nter

actio

n

Hum

an

Guided Access Free Access

Interaction Modes

Freedom of access

Basic Advanced(Human) Usertechnical skills Intermediate

Free Query via SPARQL EndPoint

12

Page 13: Dealing with Open Data in Istat

Use Case 1: Spatial Querying

App that displays on a map some population indicators of the nearest census sections to specific GPS coordinates

LOD when accompanied by spatial information allow to access data using spatial queries

13

Page 14: Dealing with Open Data in Istat

Use Case 2: Federated Querying - 1

Federated query on Istat and ISPRA, i.e. the query accesses Istat and ISPRA portals

With LOD, it is very easy to compare data coming from different sources (linked for example at territorial level)

Query on one Portal

Results dynamically retrieved

from both portals

14

ISPRA - The Italian National Institute for Environmental Protection and Research

Istat

Page 15: Dealing with Open Data in Istat

Use Case 2: Federated Querying - 2

ISTAT data:Census Buildings

ISPRA data:Data on land use / soil consumption

Example Query:Municipality-level analysis of land use / soil consumption and number of buildings by period of construction

Dynamically generated!

15

Page 16: Dealing with Open Data in Istat

Use Case 3: Istat as Open Data Provider in SPOD

Social discussion (on the left) about a graphic representation of Census Data (on the right)

Dynamically generated!

16

SPOD: Social Platform for Open Data

Page 17: Dealing with Open Data in Istat

Conclusions

A dissemination strategy based on open data does put the Official Statistics users at the centre: Reaching them through different channels

e.g. apps and social media Making easier for them to retrieve data

e.g. federated queries that make transparent the distribution of data on different portals

Providing richer services to them e.g. spatial querying and dynamical

visualizations

17

Open data and in particular Linked Open Data have a leading role in data innovation

for Official Statistics

Page 18: Dealing with Open Data in Istat

• Macroscale vs microscale modeling– Pseudo-Einstein (as simple as possible but not simpler)– Von Neumann (agent-based modeling)

• Technological constraint enabling technology• It widens the space of what is feasible:

– In production: our experience with SBS.Frame– In analysis and research…

• A paradigm shift?– Statistical mechanics vs agent-based modeling– Just because you can doesn’t mean you should

• Back to open data– From dissemination to release (“data liberation” at StatCan) to the

development of information– The regulators need to introduce new rules in line with the new

scenarios (Don't think of an elephant!: know your values and frame the debate)

One More Thing

18

Page 19: Dealing with Open Data in Istat

Thanks to all my colleagues in Istat contributing to the LOD Portal

Special thanks to Monica Scannapieco and Stefano De Francisci

Questions and clarifications: contact me at [email protected]

Acknowledgments and Thanks

18