Open Source Data Services for Strategic SOA utilising WSO2 Data Services Server
SUPPORT DATA OPEN Empowering the reuse of Open … · OPEN DATA SUPPORT We provide services in the...
Transcript of SUPPORT DATA OPEN Empowering the reuse of Open … · OPEN DATA SUPPORT We provide services in the...
DATASUPPORT
OPEN Empowering the reuse of Open Government Data across Europe
ePSI platform workshop 2 October 2013
Budapest, Hungary
DATASUPPORTOPEN
This presentation has been created by PwC EU Services Nikolaos Loutas [email protected]
Disclaimer
The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission.
The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof.
Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission.
All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative.
Presentation metadata
Slide 2
Open Data Support is funded by the European Commission under SMART 2012/0107 ‘Lot 2: Provision of services for the Publication, Access and Reuse of Open Public Data across the European Union, through existing open data portals’(Contract No. 30-CE-0530965/00-17).
DATASUPPORTOPEN
Contents
• Promises and challenges of Open Government Data.
• The Open Data Support project.
• Linked data as a means of promoting the reuse of Open Government Data.
• Linked Open Government Data supply and demand.
What is this talk about?
Slide 3
DATASUPPORTOPEN
The promises and the challenges of Open
Government Data
Slide 4
DATASUPPORTOPEN
Open Government Data has a great potential to create social and economic value
Slide 5
Developers /
Companies integrate
data into apps
(services)
Public
administrations
share data online
Citizens/businesses
benefit from the
apps (services)
Developers /
Companies
search for data
Publishing data
Reusing data
DATASUPPORTOPEN
There are more than 150 portals in Europe hosting Open Government Data
150+
Existing OGD Portal
Slide 6
DATASUPPORTOPEN
Barriers to Open Data publishing and reuse
Data publishers Data reusers
No view on which data is more likely to be reused / has a higher ROI potential.
Lack of overview of existing/available datasets.
Unclear business model for publishing Open Data.
Unclear business model for reusing Open Data.
Limited tool support. Data is often of low quality, outdated, unstructured and/or not machine-readable.
Competing licences for datasets. Lack of licensing information or incompatible licences.
Competing vocabularies for describing datasets.
Different vocabularies when searching for datasets.
Domain-specific metadata needs. Lack of (good quality) metadata.
Effort required for keeping the metadata up-to-date.
Lack of provenance information.
Slide 7
Meta
da
ta
Met
ad
ata
DATASUPPORTOPEN
Limited accessibility and lack of (cross-border/sector) awareness of open datasets
Different metadata
vocabularies
Limited accessibility and lack of awareness
Limited reuse of
open datasets
Slide 8
DATASUPPORTOPEN
No reuse = No social and economic value
Slide 9
Developers /
Companies integrate
data into apps
(services)
Public
administrations
share data online
Citizens/businesses
benefit from the
apps (services)
Developers /
Companies
search for data
DATASUPPORTOPEN
Open Data Support to the
rescue…
Slide 10
DATASUPPORTOPEN
The mission of Open Data Support is...
Slide 11
To improve the and facilitate the to datasets published on local and national Open Data portals in order to increase their within and across borders.
DATASUPPORTOPEN
How are we going to achieve our objectives?
... by providing
to metadata descriptions of open datasets via a
ODIPP Pan-European
Data portal
Slide 12
DATASUPPORTOPEN
We provide services in the area of Open Government Data
Publishing services
Training services
Consulting services
We help (potential) Open Government Data publishers to prepare, transform and publish reusable metadata descriptions of their datasets.
We train public administrations across Europe on the value of (Linked) Open Government Data and help them build capacity in effectively and efficiently publishing data.
We provide consultancy services tailored to the needs of public administrations covering a wide spectrum of topics from IT to licensing of (Linked) Open Government Data.
Slide 13
DATASUPPORTOPEN
We provide publishing services
Target audience
We help (potential) Open Government Data publishers to prepare, transform and publish reusable metadata descriptions of their datasets.
Open Data portals Gov Data publishers
What is in it for you?
Increase visibility: Make datasets searchable through different access points.
Increase reuse: Enlarge and empower community of re-users.
Improve quality: Enrich OGD with metadata, link data with other data, crowd-source data quality.
Slide 14
DATASUPPORTOPEN
We provide training services
Training services
Target audience
Policy makers and government officials
We train public administrations across Europe on the value of (Linked) Open Government Data and help them build capacity in effectively and efficiently publishing data.
Government IT strategists
Government software engineers
What will you learn?
Comprehend and formulate the added value of Linked Open Government Data for your domain.
Understand the life cycle and the publishing requirements of Linked Open Government Data.
Acquire hands-on experience on Linked Government Open Data technologies.
Slide 15
DATASUPPORTOPEN
Onsite training
Online training
Our training services
All trainings are freely
available online
through Joinup.
We provide onsite trainings
adapted to your needs.
Slide 16
https://joinup.ec.europa.eu/community/ods/document/online-training-material
DATASUPPORTOPEN
We provide consulting services
Target audience
We provide consultancy services tailored to the needs of public administrations covering a wide spectrum of topics from IT to licensing of (Linked) Open Government Data.
What is in it for you?
Open Data Portals Gov data publishers
Raise awareness of the economic and social value of open data
Slide 17
Build new knowledge, skills, technical expertise and hands-on experience on Linked Open Data technologies.
DATASUPPORTOPEN
How can Linked Data technologies promote the reuse of Open Government
Data?
Slide 18
DATASUPPORTOPEN
Defining linked data...
“Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.”
EC ISA Case Study: How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee):
1. Use Uniform Resource Identifiers (URIs) as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).
4. Include links to other URIs so that they can discover more things.
Slide 19
DATASUPPORTOPEN
Linked data vs. open data
Open data
Data can be published and be publicly available under an open licence without linking to other data sources.
Linked data
Data can be linked to URIs from other data sources, using open standards such as RDF without necessarily being publicly available under an open licence.
Slide 20
“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” - OpenDefinition.org
DATASUPPORTOPEN
Defining metadata?
“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.”
-- National Information Standards Organization
http://www.niso.org/publications/press/UnderstandingMetadata.pdf
Slide 21
Dataset description (DCAT) Dataset
DATASUPPORTOPEN
Benefits of using Linked Open Government Data
• Allows for flexible integration of datasets from different sources, without needing the data to be moved.
• Fosters the reuse of information from reference/authoritative sources.
• Caters for assigning common identifiers in the form of HTTP URIs to things (e.g. people, products, business, locations...).
• Provides context to data – richer and more expressive data.
• The use of standard Web interfaces (such as HTTP and SPARQL) can simplify the use of data for machines.
Slide 22
DATASUPPORTOPEN
Considerations for publishing Linked Open Government Data
• Linked Data is high-quality data. Considerable data cleansing and curation is required.
• Managing the data lifecycle is a challenging task. Mechanisms for handling updates and deletions in the data should be devised.
• The tools and software supporting linked data solutions are still not at production level/quality.
• A central authority should take the responsibility of publishing and maintaining persistent HTTP URIs for data resources. Existing identifiers should be reused to the extent possible, especially the ones coming from reference data sources, such as company registers.
• Data is currently available under different licences and in most cases no licence actually exists. This hampers data reuse and integration. Possible licensing options for data and description metadata should be explored. The use of open licence, e.g. a public domain licence – CC0, is recommended, particularly for the metadata.
• Alternative business model for publishing linked data should be further explored. The costs and benefits of the different alternatives need to identified, before governments can decide on the adoption of the linked data technological paradigm.
Slide 23
DATASUPPORTOPEN
Linked Open Government Data and metadata lifecycle focusing on supply and demand
Slide 24
Select Model Publish Find Integrate Re-use
Data management
Feedback
Data Publisher
Data Consumers
Supply Demand
DATASUPPORTOPEN
Linked Open Government Data & metadata supply Governments opening up their data and publishing it as linked data along with appropriate metadata descriptions.
Slide 25
DATASUPPORTOPEN
Selection of high-value data
Several dimensions can be considered in the selection process of Linked Open Government Data.
• Legal requirements: Is there a law that makes open publication mandatory or is there no specific obligation?
• Relation to public task: Is the data the direct result of the primary public task of government or is it a product of a non-essential activity?
• Current status of open publication: Is the data already openly available or does it still need to be opened up?
• Type of value: Is the data useful for social engagement or does it have commercial value?
• Audience: Is the data primarily intended for the public or is it primarily aimed at back-office integration?
Slide 26
DATASUPPORTOPEN
Selection based on legal requirements
Some data may be covered by a law or regulation that mandates its open publication, e.g.:
• Text of laws, directives, regulations etc.
• Proposals and proceedings of parliament and committees.
• Election results.
• Public budgets and expenditures.
• Invitations to tender and contract awards.
Other data may be the by-product of government activity and would be useful for citizens and business to have access to, e.g.:
• Condition of infrastructure and public spaces (roads, trees) .
• Timetables of public transport and schedules of garbage collection.
Slide 27
DATASUPPORTOPEN
Selection based on relation to public task
Some data may be the direct result of the primary public task of government, for example the functions listed in COFOG, e.g.:
• Executive, legislative organs, financial/fiscal affairs etc.).
• Public order and safety.
• Environmental protection.
• Health.
• Culture.
• Education.
Other data produced by government are non-essential (they could be – and sometimes are – provided by the private sector) e.g.:
• Mapping for navigation (cf. Google Street View)
• Weather forecasts (cf. Weather Channel)
Slide 28
DATASUPPORTOPEN
Selection based on status of openness
Some data is already published openly and electronically, e.g. (in some countries):
• Cadastral information.
• Topographic maps.
• Traffic information.
• Weather forecasts.
Other data may still be hidden from the public (maybe because it is hard to publish or includes personal data, sensitive data or is partly subject to third-party licensing).
Slide 29
DATASUPPORTOPEN
Selection based on type of value
Some data may have primarily societal value, e.g.:
• Laws and parliamentary data (e.g. voting records of representatives)
• Pre-election information (e.g. programmes of political parties)
• eDemocracy and eParticipation (e.g. public consultations)
Other data may have more commercial value, e.g.:
• Road maps, real-time traffic information
• Real-time weather information
• Business information
Slide 30
DATASUPPORTOPEN
Selection based on type of audience
Some data is aimed at society (citizens and business), e.g.:
• Legal information.
• eDemocracy, eParticipation and public consultation.
• Procurement.
Other data are focused on internal usage or for integration in the back office, e.g.:
• Various sources that are used for law enforcement.
• Service performance indicators.
• Job descriptions of civil servants.
Slide 31
DATASUPPORTOPEN
Selection based on size of audience
Some data are aimed at large audiences and mass markets, e.g.:
• Traffic information.
• Public transport.
• Election data.
Other data are essential for small groups of people and niche markets, e.g.:
• Information about facilities and financial support for people with special needs.
• Economic statistics.
• Court decisions.
Slide 32
DATASUPPORTOPEN
Selection based on the needs of the audience What data do the reusers need/want?
According to a Spanish study, the following types of information are reused the most by businesses:
Slide 33
51,1% 46,8%
29,7% 27,7%
12,8% 12,8% 12,8% 10,0%
See also: http://datos.gob.es/datos/sites/default/files/files/E
studio_infomediario/121001%20RED%20007%20
Final%20Report_2012%20Edition_vF_en.pdf
DATASUPPORTOPEN
Which datasets are mostly viewed on Data.gov.uk
Slide 34 http://data.gov.uk/data/site-usage/publisher?month= http://data.gov.uk/data/site-usage/dataset
Per publisher Per dataset
DATASUPPORTOPEN
Modelling your data & metadata is about ...
• Making your data available in a structured, comprehensible and machine-readable way.
• Reusing what already exists in terms of vocabularies and reference data.
• Reaching the right quality level by cleansing your data.
• Providing licensing information so that data consumers know what the conditions of reuse are.
• Providing a rich description (metadata).
• Using semantic technologies (RDF, HTTP URIs...) for describing your data.
Slide 35
DATASUPPORTOPEN
Model your data – reuse if possible, mint if necessary
• Reuse existing vocabularies as much as possible.
If you determine there is no reusable, authoritative source for the specific domain, create your own using:
◦ RDF Schema (RDFS): Basic RDF vocabulary to describe the classes and properties of classes.
◦ Web Ontology Language (OWL): knowledge representation language for describing ontologies.
Slide 36
See also: http://www.slideshare.net/OpenDataSupport/model-your-data-metadata
http//www.w3.org/TR/owl-features/
http://www.w3.org/TR/rdf-schema/
DATASUPPORTOPEN
Reuse common vocabularies to model and describe your data ... (in RDF)
General purpose vocabularies: DCMI, RDFS
To name things: rdfs:label, foaf:name, skos:prefLabel
To describe people: FOAF, vCard, Core Person Vocabulary
To describe registered organisations: Registered Organisation Vocabulary
To describe addresses: vCard, Core Location Vocabulary
To describe public services: Core Public Service Vocabulary
...and metadata...
To describe datasets (metadata): DCAT, DCAT Application Profile, VoID
To describe projects: DOAP, ADMS.SW
To describe interoperability assets: ADMS
Slide 37
DATASUPPORTOPEN
Find reusable vocabularies Joinup
• Online platform for searching for and sharing interoperability assets described with ADMS.
Developed by the ISA Programme of the EC.
Slide 38 http://joinup.ec.europa.eu/
Refine the search results via the faceted search filters. 2
Enter a search keyword to find interoperability assets available on different websites.
1
The search results contain a relevant description of the assets and the link from where they can be downloaded.
3
DATASUPPORTOPEN
Find reusable vocabularies Linked Open Vocabularies
Slide 39
http://lov.okfn.org/
• Provides easy access methods to the ecosystem of vocabularies .
• Makes the ways they link to each other explicit.
• Provides metrics on how they are used in the LOGD cloud.
• Developed by the Open Knowledge Foundation.
DATASUPPORTOPEN
To ensure data and metadata can be published with an appropriate level of quality and minimum errors.
This means:
• Fixing errors.
• Transforming/homogenising formats.
• Aligning inconsistencies in data and metadata.
• Removing duplicate/redundant information.
• Adding lacking information.
• Making sure the information is up-to-date.
Cleansing your data & metadata
Slide 40
See also: http://www.slideshare.net/OpenDataSupport/introduction-to-rdf-sparql
Cleanse your data with Open Refine (Google Refine) -
https://code.google.com/p/google-refine/
DATASUPPORTOPEN
Company_name Registration date Country E-mail # Establishments
Nikè 1991-04-28 Belgium niké 7
BARCO 15 September 1986 BE [email protected] 2
Nikè België
Coca-Cola United States [email protected] 3
Cleansing data - example
Slide 41
Formatting issue
Missing information
Duplicate error
Redundant information Inconsistent
information
Company_name Registration date Country E-mail
Nikè 1991-04-28 BE niké@sport.org
BARCO 1986-09-05 BE [email protected]
Coca-Cola 1964-03-26 US [email protected]
Cleansing operations
DATASUPPORTOPEN
Model your metadata
The DCAT Application profile for data portals in Europe (DCAT-AP) is a specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe.
DCAT-AP improves the discovery of public sector datasets across borders and sectors.
Slide 42
See also: https://joinup.ec.europa.eu/asset/dcat_application_profile
/description
DATASUPPORTOPEN
Use persistent Uniform Resource Identifiers (URI) for naming things
Slide 43
i.e. independent of the data originator
e.g. http://www.example.com/id/alice_brown
e.g. http://education.data.gov.uk/ministryofeducation/id/school/123456
e.g. http://education.data.gov.uk/doc/school?id=123456
e.g. http://education.data.gov.uk/doc/school/v01/123456
e.g. http://education.data.gov.uk/id/school1/123457
e.g. http://education.data.gov.uk/id/school1/123456
e.g. http://data.example.org/doc/foo/bar.rdf
e.g. http://data.example.org/doc/foo/bar.html
e.g. http://education.data.gov.uk/id/school/123456
e.g. http://{domain}/{type}/{concept}/{reference}
Follow the pattern
Re-use existing identifiers
Link multiple representations
Implement 303 redirects for real-world objects
Use a dedicated service
Avoid stating ownership
Avoid version numbers
Avoid using auto-increment
Avoid query strings
rulesfor persistent
http://education.data.gov.uk/doc/schools/123456.csv
Avoid file extensions
See also: http://www.slideshare.net/OpenDataSupport/design-and-manage-persitent-uris https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent-uris
Persistent URIs sets the foundations for Linked Data.
DATASUPPORTOPEN
Licensing your data and metadata is about...
• Informing potential reusers on how the data and metadata can be (re)used and/or adapted.
• Not associating your data and metadata with licensing information, is an important barrier for reuse and thus lowers the value opening your data will create.
• Open data should be, by definition, published under an open license.
• Metadata should be published under a licence indicating it is public domain to reinforce the reuse and discoverability of your data.
Slide 44
See also: http://www.slideshare.net/OpenDataSupport/licence-your-data-metadata
DATASUPPORTOPEN
Open licences
• Creative Commons (CC) (http://creativecommons.org/licenses/)
- Attribution (BY): The creator of the work should be mentioned.
- Non Commercial (NC): The work can not be used for commercial purposes.
- No Derivatives (ND): The work cannot be adapted or merged with other work.
- Share Alike (SA): The work can be adapted but must be attributed with the same license if you make it available.
- CC Zero (CC0): The work is public domain
• Open Data Commons (http://opendatacommons.org/licenses/)
- „Open Data Commons Attribution Licence (ODC-By): compatible with CC BY
- Open Data Commons Open Database Licence (ODC-ODbL): compatible with CC BY SA)
- Public Domain Dedication Licence (PDDL): compatible with CC Zero
• The Open Government Licence (http://www.nationalarchives.gov.uk/doc/open-government-
licence/)
Slide 45
See also: http://discovery.ac.uk/files/pdf/Licensing_Open
_Data_A_Practical_Guide.pdf
DATASUPPORTOPEN
Publishing linked data is about ...
Breaking down the walls of the silos in order to create more value.
• Making your data and metadata publically and easily accessible on the Web.
• Linking your data and metadata to other data (or metadata) in order to:
Attach meaning and content to it.
Give context to it.
Enrich it.
Allow people to discover more.
Slide 46
DATASUPPORTOPEN
Provide a SPARQL Endpoint
A SPARQL Endpoint is a service that allows others to query your linked data (and/or metadata).
Slide 47
http://data.opendatasupport.eu
DATASUPPORTOPEN
Publish your metadata
Publish your metadata on a central data broker to give it more visibility and increase the reuse of your datasets.
Slide 48
See also: http://www.slideshare.net/OpenDataSupport/pr
omoting-the-re-use-of-open-data-through-odip
The Open Data Interoperability Platform (ODIP):
• ODIP is a central metadata broker developed by the European Commission to enable the cross-border European search for datasets, where data providers can publish the description metadata of their datasets.
DATASUPPORTOPEN
Data & metadata management is about...
• Managing the lifecycle of the data – creation, update and keeping up to data, and decommissioning of datasets.
• Managing the lifecycle of the metadata.
• Putting in place processes for making sure your data and metadata has an appropriate level of quality.
• Stating ownership of data(sets) and metadata.
Slide 49
See also: http://www.slideshare.net/OpenDataSupport/int
roduction-to-metadata-management
DATASUPPORTOPEN
Collecting feedback from the reusers of your data
Ask feedback to the (potential) users of data:
• Which data do they need.
• How did they use the data.
• What did they think about the quality.
• Make sure that requests and fixes reach you – crowdsource data quality!
Slide 50
data.overheid.nl
data.gov.uk
DATASUPPORTOPEN
Linked Open Government Data demand The needs of businesses, entrepreneurs, researchers and governments for linked data.
Slide 51
DATASUPPORTOPEN
The LOGD lifecycle demand side is about ...
Data reusers being able to
• find appropriate datasets;
• use the datasets for analysis, building apps and services.
Slide 52
Developers /
Companies
integrate data into
app (services)
Public
administrations
share data
online
Citizens/business
es benefits from
the app (services)
Developers /
Companies
search for data
Publishing data
Reusing data
DATASUPPORTOPEN
Where to find datasets?
Datasets are made available on various platforms spread across Europe.
“ODIP - A data Broker collects the metadata from various open data platforms and publishes it using a common metadata model. This way the datasets are searchable in a uniform way from a single point of access.”
• Local Open Data Portals, e.g.
- opendatamanchester.org.uk
- Data.gent.be
• Regional Open Data Portals e.g.
- opendata.regionpaca.fr
- Open public data of the government of Catalonia
• National Open Data Portals
- Opendata.at
- opendata.lu
• European Open Data Portals, e.g.
- open-data.europa.eu
• Open Data Brokers, e.g.
- Publicdata.eu
- ODIP
Slide 53
DATASUPPORTOPEN
Use SPARQL endpoint or faceted browser to find datasets
A user looking can execute a SPARQL query on a SPARQL endpoint to find datasets or “filter his way” through the collection of datasets using a faceted browser.
Slide 54
SPARQL endpoint Faceted browser
http://data.gov.uk/sparql
http://data.gov.uk/data/search
DATASUPPORTOPEN
Integrating datasets and building apps & services
Some tools for integrating datasets:
• Karma (http://www.isi.edu/integration/karma/)
• Talend (http://www.talend.com/products/data-integration)
Slide 55
See also: http://www.slideshare.net/OpenDataSupport/int
roduction-to-linked-data-23402165
DATASUPPORTOPEN
Example of Linked Open Government Data reuse Building a common integrated view on address data in Belgium
Slide 56
See also: https://joinup.ec.europa.eu/asset/core_
location/document/core-location-pilot-
interconnecting-belgian-address-data
http://location.testproject.eu
Thank you! ...and now YOUR questions?
DATASUPPORTOPEN
Be part of our team...
Slide 58
Find us on
Contact us
Join us on
Follow us
Open Data Support http://www.slideshare.net/OpenDataSupport
http://www.opendatasupport.eu Open Data Support http://goo.gl/y9ZZI
@OpenDataSupport [email protected]