The Web of Data emerging industries

73
The Web of Data emerging industries Michalis Vafopoulos 04/04/2013

description

The Web of Data emerging industries . Michalis Vafopoulos 04/04/2013. Contents . The Web of documents vs. Web of data Some technology Some economics ..and action PSGR project and more…. The Web of Documents. Simple, big and unstructured Organized in Silos But humans: - PowerPoint PPT Presentation

Transcript of The Web of Data emerging industries

Page 1: The Web of Data emerging industries

The Web of Data emerging industries

Michalis Vafopoulos04/04/2013

Page 2: The Web of Data emerging industries

Contents ① The Web of documents vs. Web of

data– Some technology– Some economics– ..and action

② PSGR project ③ and more…

2

Page 3: The Web of Data emerging industries

The Web of Documents• Simple, big and unstructured• Organized in Silos

But humans:• are interested in Things,no documents & these Things might be in docs or elsewhere

• Limited capacity to extract meaning...

3

Page 4: The Web of Data emerging industries

The Web of Data• Analogy: a global file system ----> global database• Designed for: human consumption ->machines first, humans

later• Primary objects: documents --> things (or descriptions of

things)• Links between: documents --> things • Degree of structure in objects: fairly low ---> high• Semantics of content and links: implicit --> explicit

(Tom Heath)4

Page 5: The Web of Data emerging industries

The Web of Data: why?

5

encourages reuse reduces redundancy maximizes its (real and potential)

inter-connectedness enables network effects to add value

to data

Page 6: The Web of Data emerging industries

The Web of Data: how?

6

– current state on the Web• Relational Databases• APIs• XML• CSV• XLSComputers can’t consume data because:• Different formats & models• Not inter-connected

Page 7: The Web of Data emerging industries

The Web of Data: how?

7

– we need to create a standard way of publishing Data on the Web (like HTML for docs)

This is the Resource Description Framework

(RDF)

(a simple example here from Juan F. Sequeda), more next semester!)

Page 8: The Web of Data emerging industries

Resource Description Framework (RDF)

• A data model – A way to model data– Inspired form Relational databases and Logic

• RDF is a triple data model• Labeled Graph (semantic networks)• Subject, Predicate, Object<Isidoro> <was born in> <Chios><Chios> <is part of> <Greece>

Page 9: The Web of Data emerging industries

Example: Document on the Web

Page 10: The Web of Data emerging industries

Databases back up documents

Isbn Title Author PublisherID ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran

1 July 2009

… … … … …PublisherID PublisherNa

me1 O’Reilly

Media… …

This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …

THINGS have PROPERTIES:A Book as a Title, an author, …

Page 11: The Web of Data emerging industries

Data representation in RDF

book

Programming the Semantic

Web

978-0-596-15381-6

Toby Segaran

Publisher O’Reilly

title

name

author

publisher

isbn

Isbn Title Author PublisherID

ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran

1 July 2009

PublisherID

PublisherName

1 O’Reilly Media

Page 12: The Web of Data emerging industries

Everything on the web is identified by a URI!

Page 13: The Web of Data emerging industries

link the data to other data

http://…/

isbn978

Programming the Semantic

Web

978-0-596-15381-6

Toby Segaran

http://…/

publisher1

O’Reilly

title

name

author

publisher

isbn

Page 14: The Web of Data emerging industries

consider the data from Revyu.comhttp://

…/isbn978

http://…/

review1

Awesome Book

http://…/

reviewerJuan

Sequeda

hasReview

reviewerdescription

name

Page 15: The Web of Data emerging industries

start to link data

http://…/

isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher

1O’Reilly

title

name

author

publisher

isbn

http://…/

isbn978

sameAs

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

hasReview

hasReviewerdescription

name

Page 16: The Web of Data emerging industries

Juan Sequeda publishes data too

http://juansequeda.com/id

livesInJuan Sequedaname

http://dbpedia.org/Austin

Page 17: The Web of Data emerging industries

Let’s link more datahttp://

…/isbn978

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

http://juansequeda.com/id

hasReview

hasReviewerdescription

name

sameAs

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 18: The Web of Data emerging industries

Linked data = internet + http + RDF

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1

O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/

review1

Awesome Book

http://…/

reviewer

Juan Sequeda

http://juansequeda.

com/id

hasReview

hasReviewer

description

name

sameAs

livesIn

Juan Sequedaname

http://dbpedia.org/Austin

Page 19: The Web of Data emerging industries

Linked data = internet + http + RDF

Page 20: The Web of Data emerging industries

Linked Data Principles1. Use URIs as names for things2. Use URIs so that people can

look up (dereference) those names.

3. When someone looks up a URI, provide useful information.

4. Include links to other URIs so that they can discover more things.

Page 21: The Web of Data emerging industries

Web as a databaseLinked Data makes the web exploitable as ONE GIANT HUGE GLOBAL DATABASE!

Is there any query language like sql?SPARQL…

Page 22: The Web of Data emerging industries

May 2007

Page 23: The Web of Data emerging industries
Page 24: The Web of Data emerging industries

What is a Linked Data application/service?

Software system that makes use of data on the Web from multiple

datasets and that benefits from links between the datasets

Page 25: The Web of Data emerging industries

Characteristics of Linked Data Applications

• Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data

• Discover further information by following the links between different data sources: the fourth principle enables this.

• Combine the consumed linked data with data from sources (not necessarily Linked Data)

• Expose the combined data back to the web following the Linked Data principles

• Offer value to end-users

Page 26: The Web of Data emerging industries

the 5 stars of open linked data

★make your stuff available on the Web (whatever format)★★make it available as structured data (e.g. excel instead of image scan of a table)★★★non-proprietary format (e.g. csv instead of excel)★★★★use URLs to identify things, so that people can point at your stuff★★★★★link your data to other people’s data to provide contexthttp://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

Page 27: The Web of Data emerging industries

Two magics of Web Science: the case of Linked Data

Page 28: The Web of Data emerging industries
Page 29: The Web of Data emerging industries

The (practical) question

contextualized & hands-on experience in Semantic Web & Business 3.0 on a unique, fast evolving and semantified dataset

29

Page 30: The Web of Data emerging industries

PSGR project: the answer

The first attempt to generate, curate, interlink and distribute daily updated public spending data in LOD formats that can be useful to both expert (i.e. scientists and professionals) and naïve users.

30

Page 31: The Web of Data emerging industries

The context first…

31

Page 32: The Web of Data emerging industries

Economy after the Web

New form of property• Public, Private, Peer (e.g. Wikipedia)

The right to: • Use-modify-benefit-transfer resources

• Energetic & connected consumption• Pro-sumption

32

Page 33: The Web of Data emerging industries

Research question

Web economy: from potential to actual

Enable new virtuous cycles in the economy through Linked Open Data

33

Page 34: The Web of Data emerging industries

Outline ① EU Unification: institutions-technology② Why Linked Open Data? ③ Economic LODo the story so faro how to starto use caseso engineering

④Government Budget⑤Tenders ⑥Spending⑦Business Information ⑧Next steps

34

Page 35: The Web of Data emerging industries

EU Unification: the institutions Best in theory – poor in practicea (complicated) market example• monetary policy, currency, eurozone • European Single Market • fiscal policy FORTHCOMING

35

Page 36: The Web of Data emerging industries

EU Unification: the technology Linked Data or Web of data• “publish once, use many times”. • different consumers extract different

slices of the data for different purposes• publish in context:

value & “meaning”

36

Page 37: The Web of Data emerging industries

EU Unification: the technology

• Linked Data (LD) + Open Data =LOD• Economic LOD as “data currency”

37

Page 38: The Web of Data emerging industries

Why LOD?

• Transparency & innovation

Network effects: enabling users to • bidirectional & massively processable

interconnections among data • re-using the existing infrastructure in the

government and business spheres

38

Page 39: The Web of Data emerging industries

Economic LOD: the story so far

• Isolated/fragmented behind technological & institutional barriers• General statistics: Eurostat etc. • LOD2 case • LOTTED (Linked Open Tenders Electronic Daily)

39

Page 40: The Web of Data emerging industries

Economic LOD: how to start A general model

40

Page 41: The Web of Data emerging industries

Economic LOD: use cases

• Business applications on top• Users: citizens, gov., EU, business• track the life-cycle of every financial flow:

evaluate budget allocation, tenders, spending and their efficiency• pre-allocate resources on provisional

public works • receive & submit information in real-time

41

Page 42: The Web of Data emerging industries

Economic LOD: engineering

42

Page 43: The Web of Data emerging industries

Government Budget• heterogeneous repositories & methods (mainly PDF)

43

Page 44: The Web of Data emerging industries

Tenders • Closed data in HTML• Public Contracts Ontology (PCO), e.g. – pco:Contract and pco:AwardCriterion

• Common Procurement Vocubulary• now working on linking our ontology to:– Payments Ontology – GoodRelations – FOAF

44

Page 45: The Web of Data emerging industries

Spending • most dynamic & open part• increasing number of countries/cities• raw & structured data• leader: the Greek Clarity project• spending decisions ex-ante to execution• Actually every decision

45

Page 46: The Web of Data emerging industries

www.publicspending.gr (*****)• based on Greek Clarity & Tax information• semantify, interconnect, clean, visualize,

SPARQL endpoint, daily update• PSGR ontology Links to– WESO products classif. – UK Payments Ontology– DBpedia and Geonames– …more to come

46

Page 47: The Web of Data emerging industries

Business Information • Registries: mainly closed• Key standards– Classification of Products by Activity (CPA)– eXtensible Business Reporting Language (XBRL)

47

Page 48: The Web of Data emerging industries

Business Information

48

Page 49: The Web of Data emerging industries

Next steps

• Working on our basic ontology• Real-life examples & apps• Bad news: A long way to go• Good news: we have started

49

Page 50: The Web of Data emerging industries

PSGR ① why Linked Open Data (LOD)② LOD in Greece③ issues ④ WHERE MY MONEY GOES App⑤ local spending in EU demo ⑥ to the future

50

Page 51: The Web of Data emerging industries

Why public spending LOD

omore & better information oobjective and processable information

for economic/political “dialogue”• to promote competition• to decrease cost • to judge the efficiency of policy mixtures• to enable participation

51

Page 52: The Web of Data emerging industries

LOD in Greece: current status

• in its infancy – NO Apps yet• 2-3 stars• Open not Linked• very limited public awareness

52

Page 53: The Web of Data emerging industries

LOD in Greece: why it is important

• quality of information during economic crisis• transparency & efficiency in funding

development

53

Page 54: The Web of Data emerging industries

Issues ohow can we initiate the virtuous cycle of

creation?demonstrate LOD’s added value

ohow to get the most out of data?local & global interconnections

54

Page 55: The Web of Data emerging industries

In few words,

Apps, Apps, Apps…..

55

Page 56: The Web of Data emerging industries

WHERE MY MONEY GOES in Greece publicspending.gr

• the first LOD App in Greece• daily updates• open spending linked data, endpoint &

visualizations

56

Page 57: The Web of Data emerging industries

WHERE MY MONEY GOES in Greece publicspending.gr

• Input 1.“Diavgeia” (all public spending decisions online daily)

API, average data quality, rich information• Payer, payee (amount, VAT number, name)• CPA 2008: Classification of products by Activity• CPV 2008: Common Procurement Vocabulary• Original decision text in pdf

2. TAXIS (official Tax Information System)VAT number validation and profile request

57

Page 58: The Web of Data emerging industries

Checklist ①Ontology – enriching with core vocub. ②Basic visualizations ③SPARQL endpoint - thedatahub④Interconnections– Product classifications – Open Corporates– Greek LOD (e-proc, geodata, dbpedia)– EU and US (CPV -> NAICS)

⑤Demos & services⑥Public awareness - working with the media , hackathons,

courses, theses 58

Page 59: The Web of Data emerging industries

59

Page 60: The Web of Data emerging industries

60

Page 61: The Web of Data emerging industries

Architecture

61

Page 62: The Web of Data emerging industries

62

publicspending.gr ontology

Page 63: The Web of Data emerging industries

63

Page 64: The Web of Data emerging industries

Network analysisBetweenness Centrality: how often a node appears on shortest paths between nodes in the network

64

Page 65: The Web of Data emerging industries

65

Size: Betweness Cent.Color: HUB (HITS)

Page 66: The Web of Data emerging industries

66

Node size:Weighted- In Degree Cent., Node color: PageRank

Page 67: The Web of Data emerging industries

67

Competition in telecoms

Page 68: The Web of Data emerging industries

Comments, ideas and more

68

Page 69: The Web of Data emerging industries

Additional material

69

Page 70: The Web of Data emerging industries
Page 71: The Web of Data emerging industries

History of LD• Linked Data Design Issues by TimBL July 2006• Linked Open Data Project WWW2007• First LOD Cloud May 2007• 1st Linked Data on the Web Workshop WWW2008• 1st Triplification Challenge 2008• How to Publish Linked Data Tutorial ISWC2008• BBC publishes Linked Data 2008• 2nd Linked Data on the Web Workshop WWW2009• NY Times announcement SemTech2009 - ISWC09• 1st Linked Data-a-thon ISWC2009• 1st How to Consume Linked Data Tutorial ISWC2009• Data.gov.uk publishes Linked Data 2010• 2st How to Consume Linked Data Tutorial WWW2010• 1st International Workshop on Consuming Linked Data COLD2010

Page 72: The Web of Data emerging industries

More Examples• http://data-gov.tw.rpi.edu/wiki• http://dbrec.net/• http://fanhu.bz/• http://data.nytimes.com/schools/scho

ols.html• http://sig.ma • http://visinav.deri.org/semtech2010/