Rio info 2013 - Linked Data at Globo.com
-
Upload
tatiana-al-chueyr -
Category
Education
-
view
1.760 -
download
1
description
Transcript of Rio info 2013 - Linked Data at Globo.com
![Page 1: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/1.jpg)
Linked Data at
Tatiana Al-Chueyr [email protected]@tati_alchueyr
18 de setembro de 2013, Simpósio Rio Info
globo.com
![Page 2: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/2.jpg)
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
![Page 3: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/3.jpg)
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
![Page 4: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/4.jpg)
Franklin Amorim
Diogo Kiss
Contributors
![Page 5: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/5.jpg)
MotivationNot only words
São Paulo
![Page 6: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/6.jpg)
MotivationNot only words
São Paulo?
![Page 7: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/7.jpg)
MotivationNot only words
São Paulo state
![Page 8: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/8.jpg)
MotivationNot only words
São Paulo city
![Page 9: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/9.jpg)
MotivationNot only words
São Paulo saint
![Page 10: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/10.jpg)
MotivationNot only words
São Paulo soccer team
![Page 11: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/11.jpg)
MotivationMultiple words for the same thing
FemalefF
femalewoman
...
![Page 12: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/12.jpg)
MotivationMultiple words for the same thing
http://data.globo.com/female
![Page 13: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/13.jpg)
Motivation
Soccer player
Cross-link content from different web products
![Page 14: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/14.jpg)
Politician
MotivationCross-link content from different web products
![Page 15: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/15.jpg)
Celebrity
Motivation● Cross-link content from different web products
MotivationCross-link content from different web products
![Page 16: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/16.jpg)
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli G1 SP
RDF
FOAF
GEO
Dublin Core
SKOS
Semantic markup in web pagesMotivation
![Page 17: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/17.jpg)
Recommend annotations to information ProducerMotivation
![Page 18: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/18.jpg)
Suggest related content to information Consumer Motivation
![Page 19: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/19.jpg)
Suggest related content to information Consumer Motivation
![Page 20: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/20.jpg)
Suggest related content to information Consumer Motivation
![Page 21: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/21.jpg)
Changes● Replacement of words by entities
http://data.globo.com/person/Person/santos_dumont
![Page 22: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/22.jpg)
Changes● Replacement of labels by qualified relationships
![Page 23: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/23.jpg)
Changes● Organize data from tables to graphs
![Page 24: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/24.jpg)
Outcomes ● To replace words by entities improved:
○ Finding
○ Linking
○ Reconciling
○ Organizing
multiple layers of information
![Page 25: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/25.jpg)
Outcomes ● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state
![Page 26: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/26.jpg)
Status QuoUsed by the main web products of Globo.com:
○ 18,485 organizations
○ 83,000 people
○ 9,129 places
○ 1,000,000+ annotated news
Which sum up 2,500,000+ entities!
from August 2010 to May 2013
![Page 27: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/27.jpg)
Linked dataproblems
![Page 28: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/28.jpg)
Legacy Architecture
CDA
CMA
triple store
search engine
ontology
![Page 29: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/29.jpg)
CDA
CMA
CDACMA
CDACMA
CDACMA
Legacy Architecture
triple store
search engine
ontology
![Page 30: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/30.jpg)
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems
![Page 31: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/31.jpg)
Problems
![Page 32: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/32.jpg)
Ontology Engineering
Domain-driven(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven(past)
Place
![Page 33: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/33.jpg)
Possible Solution
UpperOntology
![Page 34: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/34.jpg)
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems
![Page 35: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/35.jpg)
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Solution
![Page 36: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/36.jpg)
Brainiaklinked data restful API
![Page 37: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/37.jpg)
CDA
CMA
CDACMA
CDACMA
CDACMA
Legacy Architecture
triple store
search engine
ontology
![Page 38: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/38.jpg)
APIBrainiak
CMA
CDA
CDA
CDA
CDA
triple store
search engine
Under Development
![Page 39: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/39.jpg)
Requirements● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance
![Page 40: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/40.jpg)
SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0
task: list all sports teams
![Page 41: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/41.jpg)
/sports/Team
Brainiak query
GET
![Page 42: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/42.jpg)
SPARQL response
![Page 43: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/43.jpg)
Brainiak response
![Page 44: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/44.jpg)
SPARQL query
SELECT DISTINCT ?classWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class .}
task: retrieve all superclasses of a class
![Page 45: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/45.jpg)
SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } }}
task: retrieve all properties of a group of classes
![Page 46: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/46.jpg)
SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . }}
}
task: retrieve the cardinalities of all properties of a certain class
![Page 47: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/47.jpg)
/place/City/_schema
Brainiak query
GET
![Page 48: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/48.jpg)
● Enrich Globo.com search
● SEO (automatic schema.org)
● Improve annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps
![Page 49: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/49.jpg)
Stay tuned
@brainiak_api
... will be soon released
as an open source project !
![Page 50: Rio info 2013 - Linked Data at Globo.com](https://reader033.fdocuments.us/reader033/viewer/2022051609/547bb07c5906b572798b4667/html5/thumbnails/50.jpg)
http://www.slideshare.net/
@semantic_team@alchueyr
Slides