Vanessa lopez linked data and search
-
Upload
dub-linked -
Category
Technology
-
view
161 -
download
3
description
Transcript of Vanessa lopez linked data and search
IBM Research – Ireland
© 2012 IBM Corporation
Linked Data and Search
Vanessa Lopez Smarter Ci*es Technology Centre
IBM Research Ireland
IBM Research – Ireland
© 2012 IBM Corporation
Provides explicit seman9cs
Extensible
Interoperability-‐focused: to enable automa9c discovery and inges9on
Large exis9ng corpora
Fundamentally incremental (like the Web)
W3C standard representa9on and common format
Government push (e.g. data.gov, data.gov.uk, Linked Government Data)
Background: Why Linked Data
IBM Research – Ireland
© 2012 IBM Corporation
Yes, yes.. Richer structured queries but ..
.. Limited usability for both data publishers and consumers
IBM Research – Ireland
© 2012 IBM Corporation
How can we help users in querying and exploring the Seman9c Web content?
IBM Research – Ireland
© 2012 IBM Corporation
State of the art • Seman9c search over messy, heterogeneous data and mash-‐ups
• Exploratory and Faceted systems • Query Builders and rela9onship finders • Ques9on Answer over Linked Data sources • Google knowledge graph hVp://technologies.kmi.open.ac.uk/poweraqua
IBM Research – Ireland
© 2012 IBM Corporation
State of the art
IBM Research – Ireland
© 2012 IBM Corporation
What makes City Data so special?
How can we make it more accessible?
Linked Data and Search -‐ Problem domain:
IBM Research – Ireland
© 2012 IBM Corporation
Seman9c processing of urban data – why is different?
• How can we go from raw data to insight into the opera9on of a city with minimal effort?
Return-‐on-‐Investment (because data integra9on is expensive)
Fit-‐for-‐all (ci9zen engagement)
IBM Research – Ireland
© 2012 IBM Corporation
Challenges: Big city data Volume • Lots of relevant informa*on
• Not linked to authorita*ve sources
Velocity • Streams • Frequent updates
Variety • Different models and file formats
• Open domain -‐ Unknown schema
Veracity • Diverse sources • Difficult to do assess quality
IBM Research – Ireland
© 2012 IBM Corporation
Business case: open data as a means to an end
IBM Research – Ireland
© 2012 IBM Corporation
• Why are ambulances late?
Business case
• 100’s of datasets from four municipal authori9es in Dublin • Most sta9c, some dynamic
• Social Media: twiVer, LiveDrive, even_ul, eventBright, … • Linked Data: DBpedia, .. • Vocabularies: IPSV, FOAF, VOID, PROV, DCAT, WSG
Sources of informa*on
• Loca9ons of Health Services • Ambulance call outs and response 9mes • Tweets about traffic conges9on • Geo-‐located tweets about people movement • Road network • Event Web Services • …
Domain of informa*on
IBM Research – Ireland
© 2012 IBM Corporation
Issues
• Linked Data to enrich data and give contextual insight for publishers and consumers: – Publish (vocabularies, annota9on) – Discovery and Search (metadata / cataloguing, full-‐text indexing, seman9c en99es)
– Link (schema alignment, linked data, social media) – Extract interes9ng views – Reason (diagnose traffic problems)
Ubiquitous aspects: Provenance, Governance, Performance, Security, Privacy
IBM Research – Ireland
© 2012 IBM Corporation
Approach– Data model
Documents + Metadata
Structure En**es Links Views Insight
Tabular Graph C1 a Cell C1 inRow r1 C1 value “name”
…
En9ty Graph e1 a En9ty e1 inRow r1 e1 inCol c2
…
Annota9on Graph e1 a En9ty e1 rdfs:label “name” e1 addr “X st” e1 lat :53.23” …
Mapping Graph e1 a En9ty e1 sameAs e2 …
Pay-‐as-‐you-‐go, Gain-‐as-‐you-‐go
• Structured metadata -‐> Queries over the metadata • Files into a standard representa9on -‐> Queries over the data. • Par9ally integrate schemata -‐> Queries across datasets. • Integrate globally -‐> Queries across Web data
IBM Research – Ireland
© 2012 IBM Corporation
Discovery: Publishing and Cataloguing
• METADATA – Many data publishers and disconnected datasets – Link metadata using domain vocabularies: IPSV – Convert to simple RDF format
Vocabulary matching
IPSV
IBM Research – Ireland
© 2012 IBM Corporation
IBM Research – Ireland
© 2012 IBM Corporation
Search and linking
Mining descrip9ons
Full text indexing
En9ty linking
Open metadata
• Full text indexing for search over metadata and content • En9ty linking and naviga9on (keywords, categories, publishing agencies, regions,..)
• Open metadata and vocabularies (VOID, PROV, etc) for data discovery and linking
• Mining descrip9ons (Dbpedia spotlight)
IBM Research – Ireland
© 2012 IBM Corporation
Faceted search: “beaches in Fingal”
IBM Research – Ireland
© 2012 IBM Corporation
IBM Research – Ireland
© 2012 IBM Corporation
Content integra9on • Incrementally lij data content (beyond search to querying across datasets content) – Extract en99es represented in RDF (PAYGO) – Label extrac9on and annota9on – Link when we have higher confidence (lat, long) – Geo-‐coding and taxonomy of tweets (traffic)
Minimal Entry cost Provenance-‐based dataset ranking
Geocoding Label extrac9on
IBM Research – Ireland
© 2012 IBM Corporation
Views • Beyond search to guiding the user to create meaningful views: – Guide the users to annotate data, recommend related datasets and create dataviews on the fly
– Ranking and context-‐based recommenda9ons – Allow seman9c based analysis on mul9ple views Hidden informa9on discovery
Mul9ple endpoints
Cross domain queries
Mul9ple interpreta9ons
IBM Research – Ireland
© 2012 IBM Corporation
Demo
• Currently: Web services and technology demonstrator
• Next: Open RDF-‐based data management deployed in Dublin City (read/write). Deployment of traffic diagnoser.
• SPUD: Seman*c Processing of Urban Data (2nd prize at the Seman*c Web Challenge – ISWC)
• Live demo: www.dublinked.ie/sandbox/Seman9cWebChall Spyros Kotoulas, Vanessa Lopez, Raymond Lloyd, Marco Luca Sbodio, Freddy Lecue, Mar;n Stephenson, Elizabeth Daly, Veli Bicer, Aris Gkoulalas-‐Divanis, Giusy Di Lorenzo, Anika Schumann, Denis PaFerson, and Pol Mac Aonghusa
IBM Research – Ireland
© 2012 IBM Corporation
Thank you!
• QuerioCity: A Linked Data PlaZorm for Urban Informa*on Management
V. Lopez, S. Kotoulas, M. L. Sbodio, M. Stephenson, A. Gkoulalas-‐Divanis, P. Mac Aonghusa. In Use track at the 11th Interna;onal Seman;c Web Conference (ISWC).
Reference Publica9on:
City Fabric Team: