Semantic day 2013 linked data at globo.com

Post on 27-Jun-2015

1.381 views 0 download

Tags:

Transcript of Semantic day 2013 linked data at globo.com

Linked Data at

Semantic Teamsemantica@corp.globo.comTatiana Al-Chueyr and Rodrigo D. A. Senra{tatiana.martins, rodrigo.senra}@corp.globo.com

globo.com

Andréia Bustamante

Ícaro Medeiros

Tatiana Al-Chueyr

Rodrigo Senra

Semantic Team

Franklin Amorim

João Caros Mendes

Alberto Beloni

André Nicodemus

Contributors

BROADCAST MOVIES PAY TV INTERNET

EVENTS MUSIC

PUBLISHING

NEW VENTURES NEWSPAPERRADIO NETWORK

Motivation

Soccer player

Cross-link content from different web products

Politician

MotivationCross-link content from different web products

Celebrity

Motivation● Cross-link content from different web products

MotivationCross-link content from different web products

Isabella Nardoni foi morta em 29 de março de 2008

na Zona Norte de São Paulo (Foto:Reprodução)

Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo.

Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.

Caso Isabella Nardoni

Juliana Cardilli G1 SP

RDF

FOAF

GEO

Dublin Core

SKOS

Semantic markup in web pagesMotivation

Recommend annotations to information ProducerMotivation

Suggest related content to information Consumer Motivation

Suggest related content to information Consumer Motivation

Suggest related content to information Consumer Motivation

Outcomes ● Flexible ways to organize content

● Ease to find related issues

● Explicit relations derived from annotated content

● Up-to-date topic pages with little editorial effort

● Linking content across different web products

● Seamless navigation leading to flow state

Status QuoUsed by the main web products of Globo.com

linking, among others:

○ 18,485 organizations

○ 82,386 people

○ 9,129 places

○ 1,000,000+ annotated news

from August 2010 to May 2013

Legacy Architecture

CDA

CMA

triple store

search engine

ontology

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

Poor data management

○ direct access to triple store (unmanaged)

○ difficulty to share data (distributed DBs)

○ re-sync triple-store and search engine index

○ scalability of triple store

○ high entropy in distributed ontology engineering

Problems

Problems

Ontology Engineering

Domain-driven(current)

Base

G1 GE EGO TVG

news sports gossip tv

Upper

Person Organization

Music

Politics

Programme Education

Sports

Product-driven(past)

Place

Possible Solution

UpperOntology

Semantic as a library

○ many different versions in production

○ programming language dependent

○ steep learning curve for RDF/OWL/SPARQL

Problems

Create an open semantic data management platform

● Scalable

● Mobile and Web friendly

● Interconnect Globo's data with external data sources

● Automate content extraction (including NER)

Next Step

Brainiaklinked data restful API

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

APIBrainiak

CMA

CDA

CDA

CDA

CDA

triple store

search engine

Under Development

Requirements● Indirect usage of SPARQL

● Programming language independent

● Data management with quality

● Finer-grained authorization and authentication

● Isolate applications from triplestore

● Improve triplestore performance

SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0

task: list all sports teams

/sports/Team

Brainiak query

GET

SPARQL response

Brainiak response

Brainiak concepts

● Instance

● Collection (set of instances from a given Class)

● Schema (the Class definition)

● Context

Instance

Collection

Schema

Context

placeState

Brazil

Country

JapanCity

Real example

/placeGET

/place/CountryGET

/place/Country/_schemaGET

/place/Country/BrazilGET

Real example

resource URL→ /place/Country/Brazil

context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Countryinstance → http://semantica.globo.com/place/Country/Brazil

URI Conventions

/place/River ?graph_uri=http://dbpedia.org/resource/classes#&class_uri=dbpedia:River

Overridencontext (graph) → http://dbpedia.org/resource/classes#class → http://dbpedia.org/ontology/River

Conventioncontext (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River

Legacy URIs

Hypermedia

● Flexibility and programmatic adaptation

● Semantic affordances

● Client has to understand what is consumed

● "Hypermedia APIs are not fully baked yet"

Brainiak hypermedia graph

context instance

/ schema

inCollection

item

instances

instances

describedBy

self

replacedelete

self

instances

self

self

self

create

collection

Services

● List Contexts

● List Collections

● Get a Schema

● List Prefixes

● Status of Services

● Create

● Retrieve

● Delete

● Edit

● List

Instances

Features

● JSON-Schema

● JSON-LD

● REST

● Python + Tornado

OPTIONS GET PUT POST DELETE

/sports/Team

Brainiak query

GET

Brainiak response

Brainiak response

Brainiak response

Brainiak response

SPARQL query

SELECT DISTINCT ?classWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class .}

task: retrieve all superclasses of a class

SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } }}

task: retrieve all properties of a group of classes

SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . }}

}

task: retrieve the cardinalities of all properties of a certain class

/place/City/_schema

Brainiak query

GET

● SEO (automatic schema.org)

● Improved annotator (DBpedia Spotlight)

● Richer content relationships (inference)

● Link to open data (e.g. DBPedia, dados.gov.br)

Next steps

Stay tuned

@brainiak_api

... will be soon released as an open source project !

Semantic Teamsemantica@corp.globo.com

globo.com

Thank you for the attention!