What is opendata

64
What is Open Data? DATAVIZ: VISUAL REPRESENTATION OF COMPLEX PHENOMENA data visualization & computational design @ Better Nouveau Workshop 14/12/2011 Lorenzo Benussi, TOP-IX Consotium [email protected] 1

description

Lecture on what is open data @

Transcript of What is opendata

Page 1: What is opendata

What is Open Data?

DATAVIZ: VISUAL REPRESENTATION OF COMPLEX PHENOMENA

data visualization & computational design

@ Better Nouveau Workshop14/12/2011

Lorenzo Benussi, TOP-IX [email protected]

1

Page 2: What is opendata

Research & Business Development

TOP-IX Consortium

2

Fellow, Department of Economics University of Turin

Fellow, NEXA Centre

Polytechnic of Turin

About me

Page 3: What is opendata

agenda

1. Background

2. Definitions

I. Open Knowledge Definition

II. Open Data Licenses

III. Pricing models

IV. Formats

3. Examples

3

Page 4: What is opendata

Did you take the bus today?

4

Page 5: What is opendata

Background

Ref: National Geographic http://ngm.nationalgeographic.com/big-idea/14/augmented-reality

5

Page 6: What is opendata

BIG DATA stylized facts 1• $600 to buy a disk drive that can store all the

world's music.• 5 billion mobile phone in use in 2010.• 30 billion pieces of content shared on Facebook

every month.• 40% of projected growth in global data generated

per year VS 5% growth in global IT spending.• 235 terabytes data collected by US Library of

Congress in April 2011.• 15 out of 17 sectors in the United States have more

data stored per company than the US Library of Congress

McKinsey: Big Data: The next frontier of innovation, competition and productivity. (may 2011)

6

Page 7: What is opendata

$300 billion potential annual value to US health care - more than X 2 total annual health care spending in Spain.

• €250 billion potential annual value to Europe's public sector administration - more than GDP of Greece.

• $600 billion potential annual consumer surplus from using personal location data globally.

• 60% potential increase in retailers' operating margins possible with big data.

• 140.000-190.000 more deep analytical talent position and 1.5 million more data-savvy managers needed to take full advantage of big data in the USA.

BIG DATA stylized facts 2

McKinsey: Big Data: The next frontier of innovation, competition and productivity. (may 2011)

7

Page 8: What is opendata

WEB(squared)

Ref: Tim O’Reilly and John Battelle (2009), Web Squared: Web 2.0 Five Years On. http://www.web2summit.com/web2009/public/schedule/detail/10194

1.Redefining Collective Intelligence: New Sensory Input2.Cooperating Data Subsystems3.How the Web Learns: Explicit vs. Implicit Meaning4.Web Meets World: The "Information Shadow" and the Internet of Things5.The Rise of Real Time: A Collective Mind

8

Page 9: What is opendata

Digital technology could enable an extraordinary range of ordinary people to become part of a creative process. (The future of ideas, Lawrence Lessig)

9

Page 10: What is opendata

When I say that innovation is being democratized, I mean that users of products and services—both firms and individual consumers—are increasingly able to innovate for themselves.(Democratizing Innovation, Eric Von Hippel)

10

Page 11: What is opendata

• Data

• Information

• Knowledge

• Value

11

Hal Varian, Google’s Chief Economist

The value of metrics

Page 12: What is opendata

12

Page 13: What is opendata

Data are not closed inside applications but they are consumed on-demand as a serviceRESTful API make possible to access data as a web resource (trough URI)

DATA as a SERVICE

13

Page 14: What is opendata

Business ModelsA. Data owner: paid to publish / revenue share.B. Data user: pay for data delivery/trasformation/

analysis services.

New Generation Marketplace3. Works with open and not-open data4. Provide data on-the-fly through API (evan custom).5. Sometime the community of data curators in

involved to maintain and expand the data crowd-sourcing (e.g. Factual).

6. Provide tools (web based) to explore the data

14

Page 15: What is opendata

What open data means? Open Data is a model to extract value from public sector information by using the data to build new tools and to create innovative services

15

Page 16: What is opendata

• The Public Sector produces and manages huge amount of data, opening PSI information in EU produces economic growth 140 billion € / year (aggregate)

• Public Data are the raw material to create new products and services

PSI (public sector information) mines

16

COURTESY/RON WHEELER. The 8,000-foot deep Homestake Gold Mine in South Dakota is the site where scientists, including UC Berkeley researchers, plan to construct the world's deepest research center.

Page 17: What is opendata

“Openness will strengthen our democracy and promote efficiency and effectiveness in

Government” Transparency and Open Government

Memorandum for the Heads of Executive Departments and Agencies (2009)

data.gov

17

[…] As you know, transparency is at the heart of our agenda for Government. We recognise that transparency and open data can be a powerful tool to help reform public services, foster innovation and empower citizens. David Cameron - Letter to Cabinet Ministers (2011)

Page 18: What is opendata

Information is the currency of democracyBenjamin Franklin (attribution)

18

Page 19: What is opendata

"... give us the unadulterated data, we want the data, we want unadulterated data. We have to ask for raw data now." Tim Berners-Lee, advisor data.gov.uk

Raw data now!

19

Page 20: What is opendata

USA - data.gov

20

UK - data.gov.uk

Australia - data.gov.au

data.gov: leading examples

Page 21: What is opendata

EUROPADirettiva 2003/98/CE del 17 novembre 2003

The evolution towards an information and knowledge society influences the life of every citizen in the Com-munity, inter alia, by enabling them to gain new ways of accessing and acquiring knowledge.

DIRECTIVE 2003/98/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 November 2003 on the re-use of public sector information

Legislation in EU, Italy and Piedmont

ITALYDecreto Legislativo n. 36 January, 24 2006 and  L. 96/2010.

PIEDMONTDelibera di Giunta regionale 36 - 1109 November 2010

21

Page 22: What is opendata

• Accountability

• Tansparency

• Collaboration

• Participation

22

WHY : civil society

Page 23: What is opendata

WHY : (digital) market

• Innovation

•Cooperation

•Competition

•Digital commons

23

Page 24: What is opendata

The first example in Italy - dati.piemonte.it

24

Page 25: What is opendata

apps4italy• All EU citizens can participate (!!) & 40K€

in cash prizes

• Building useful, innovative projects based on italian public data (not only open data)

• Four main categories (growing):

1. Ideas

2. Apps

3. Visualization

4. Datasets

25

Ref: appsforitaly.org

Page 26: What is opendata

Open Data: definitions

26

Page 27: What is opendata

Open Knowledge Definition v.1.1 by OKF

1. Access

2. Redistribution

3. Reuse

4. Absence of technological restriction

5. Attribution

6. Integrity

7. No discrimination (persons or groups)

8. No discrimination (fields or endeavor)

9. Distribution of license

10. License must not be specific to a package

11. License must not restrict the distribution of other works

A work is open if its manner of distribution satisfies the following conditions:

27

Page 28: What is opendata

Open Definition - http://opendefinition.org/okd/Version 1.1

Terminology

The term knowledge is taken to include:

# 1.# Content such as music, films, books# 2.# Data be it scientific, historical, geographic or otherwise# 3.# Government and other administrative information

Software is excluded [...]

The term work will be used to denote the item or piece of knowledge which is being transferred.

The term package may also be used to denote a collection of works. [...]

The term license refers to the legal license under which the work is made available. Where no license has been made this should be interpreted as referring to the resulting default legal conditions under which the work is available (for example copyright).

28

Page 29: What is opendata

The Definition - A work is open if its manner of distribution satisfies the following conditions:

1. ACCESSThe work shall be available as a whole and at no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.

2. REDISTRIBUTIONThe license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution.

3. REUSEThe license must allow for modifications and derivative works and must allow them to be distributed under the terms of the original work.

29

Page 30: What is opendata

4. ABSENCE OF TECHNOLOGICAL RESTRICTIONThe work must be provided in such a form that there are no technological obstacles to the performance of the above activities. This can be achieved by the provision of the work in an open data format, i.e. one whose specification is publicly and freely available and which places no restrictions monetary or otherwise upon its use.

5. ATTRIBUTIONThe license may require as a condition for redistribution and re-use the attribution of the contributors and creators to the work. If this condition is imposed it must not be onerous. For example if attribution is required a list of those requiring attribution should accompany the work.

6. INTEGRITYThe license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work.

30

Page 31: What is opendata

7. NO DISCRIMINATION AGAINST PERSONS OR GROUPSThe license must not discriminate against any person or group of persons.

8. NO DISCRIMINATION AGAINST FIELDS OF ENDEAVORThe license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for genetic research.

9. DISTRIBUTION OF LICENSEThe rights attached to the work must apply to all to whom it is redistributed without the need for execution of an additional license by those parties.

10. LICENSE MUST NOT BE SPECIFIC TO A PACKAGEThe rights attached to the work must not depend on the work being part of a particular package. If the work is extracted from that package and used or distributed within the terms of the work’s license, all parties to whom the work is redistributed should have the same rights as those that are granted in conjunction with the original package.

11. LICENSE MUST NOT RESTRICT THE DISTRIBUTION OF OTHER WORKSThe license must not place restrictions on other works that are distributed along with the licensed work. For example, the license must not insist that all other works distributed on the same medium are open.

31

Page 32: What is opendata

Open Data: prices 32

Page 33: What is opendata

• The transition from a physically-based to a knowledge-based economic environment made information a primary wealth-creating asset.

• Digital access to information seems to have changed the structure of many industries, promoting services-oriented business models based on disclosure and sharing of information and knowledge.

A paradigmatic shift:information economy

33

Page 34: What is opendata

• The Public Sector holds and manages huge amounts of data and information. Fostering access to those repositories enables new business opportunities that can broaden market volumes in such sectors.

• PSI represents the raw material from which value added products and services can be designed.

A paradigmatic shift:PSI data mines

34

Page 35: What is opendata

PSI can be used and reused in many ways (non rivalry in

consumption):

1.Broad range of sectors

2.Different sets of actors

3.PSI holders

4.Private re-users

5.Regulatory bodies

6.Citizens

The use/value of PSI

35

Several supply chain configurations.

1.Linear models (private re-users add value)

2.User generated contents

3.Information sharing between public bodies

Page 36: What is opendata

• The peculiar cost structure of digital data collecting, processing and delivering (high fixed costs, zero marginal cost) strongly influences the possible pricing strategies to be adopted by PSI holders.

• Pollock (2008): a price that equals marginal costs (i.e. PSI free of charge) is socially optimal provided that elasticity of demand and positive externalities overcome a given threshold.

✓Empirics: those conditions are likely to be verified in most of the PSI domains.

The price of PSI:the “free data” approach

36

Page 37: What is opendata

• Although a cost recovery regime may bound potential demand and distort competition, several critical issues could trigger its adoption.

• Underestimation of downstream demand and network externalities.✓Lack of long-run commitment in subsidizing PSI collection.✓Short-term decision making.✓Moral hazard (?).

The price of PSI:cost recovery approach

37

Page 38: What is opendata

Directive 2003/98/EC is aimed at fostering PSI reuse mainly by promoting:1.PSI availability in digital format2.Transparency of reuse conditions and pricing3.Non discrimination

Directive impact Main condition Example

Closed shop Minor. Public Sector bodies continue to control the supply chain.

Information is strongly liked with the functioning of public bodies.

Cadastral information

Battlefield Non-negligible. New entrants step into the downstream market.

Information is important while not strategic for PA.

Meteorological data

Playground

Strong. Public Sector enlarges its influence over the downstream stages.

Digitalization offers new opportunities for value extraction.

Legal information

Non-negligible. Public Sector has the only role of information holder.

Information reuse generates high demand volumes from citizens and firms

Traffic and transport information

38

MEPSIR (2006) Which market configurations are likely to emerge?

The price of PSI: possible scenarios

Page 39: What is opendata

All pricing strategies encompass potential risks of inefficiency for PSI holders (due to lack of incentives in reducing costs

and/or improving quality).

The importance of the regulatory framework

The price of PSI:Externalities & Policy

39

The Central Role of Externalities

Page 40: What is opendata

Open Data: formats40

Page 41: What is opendata

Linked open data and Semantic web

The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (by Tim Berners-Lee)

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more things.

Ref: http://www.w3.org/DesignIssues/LinkedData.html

41

Page 42: What is opendata

42

Page 43: What is opendata

Linked open data: basic principles

1. Everything has a name (people, locations, etc.)

1. Every name starts with http://

3. All data are described by using RDF (Resource Description Framework is a W3C standard).

Tim Berners Lee talk on linked data:http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

43

Page 44: What is opendata

Data as a RDF graph

44

Page 45: What is opendata

The Vision - A global interconnected database

45

Page 46: What is opendata

The Vision - Mix data on-the-fly

46

Page 47: What is opendata

Linked data - hands onDBPedia provide information of wikipedia as Linked Data. Example, Turin airport: http://dbpedia.org/page/Turin_Caselle_Airport

47

Page 48: What is opendata

Open Data: license

48

Page 49: What is opendata

Open Data license 1 (OKF)

Open Knowledge foundation licences

1. Public Domain Dedication and License (PDDL) — “Public Domain for data/databases”

2. Open Data Commons Attribution License (ODC-By) — “Attribution for data/databases”

3. Open Data Commons Open Database License (ODC-ODbL) — “Attribution Share-Alike for data/databases”

Ref: http://www.opendatacommons.org/licenses/

49

Page 50: What is opendata

Open Data licenses 2 (CC e IODL)

Creative Commons Licenses (http://creativecommons.org/licenses/)

1. CC Zero

2. CC by - Atribution

3. CC SA - Share alike

4. CC BY-SA - Attribution and Share alike

Italian open data license (http://www.formez.it/iodl/)

• IODL - Italian Open Data License (BY-SA)

50

Page 51: What is opendata

examples

51

Page 52: What is opendata

2 groupsI. Transparency

II. Information services

52

Page 53: What is opendata

Transparency

• Public assembly (parliament, councils)

• Public Budget and expenses

• Public procurement

53

Page 54: What is opendata

Info services

• Transportation

• Environment

• Cultural heritage

Ref: http://traintimes.org.uk/map/tube/

54

Page 55: What is opendata

food

55

Page 56: What is opendata

kids

56

Page 57: What is opendata

environment

57

Page 58: What is opendata

transportation

58

Ref: http://traintimes.org.uk/map/tube/

Page 60: What is opendata

Where to find open dataOpen (and not open) data archivehttp://ckan.net/http://it.ckan.net/

Example of italian datasets:Dati.gov.it: http://www.dati.gov.it/5T: http://biennaledemocrazia.it/dataset/Dati Piemonte: http://dati.piemonte.itISTAT: http://dati.istat.it/Enel: http://data.enel.com/

60

Page 61: What is opendata

Tools and linksONLINE DATA VISUALIZATIONG visualization Api: http://code.google.com/intl/it-IT/apis/chart/Tableau Public: http://www.tableausoftware.com/publicOpen Heat Map: http://www.openheatmap.com/

ONLINE STORAGE+VISUALIZATIONGoogle Public Data explorer: http://www.google.com/publicdata/homeIBM Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/Google Fusion tables: http://www.google.com/fusiontables/HomeImpure: http://www.impure.com/

CURATION & LINKINGGoogle RefineData Wrangler: http://vis.stanford.edu/wrangler/

OFFLINE TOOLSR: http://www.r-project.org/Jscript Library for data viz: http://thejit.org/Anche questa: http://vis.stanford.edu/protovis/Network / graph analysis / visualization: http://gephi.org/Language turing complete for dataviz for visual artist: http://processing.org/

61

Page 62: What is opendata

wrap-up

1. Not all public data are open data

2. Public data and gov data are often “broken” (strange formats and ambiguous IP)

3. Open Data make sense if we put it in perspective - the rise of Big Data

62

Page 63: What is opendata

everything is changing

63