Knowledge graphs in search engines
-
Upload
emanuele-della-valle -
Category
Internet
-
view
42 -
download
0
Transcript of Knowledge graphs in search engines
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Knowledge Graphs in search engines like Google
Emanuele Della ValleDEIB - Politecnico di Milanohttp://emanueledellavalle.org @manudellavalle
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Share, Remix, Reuse — Legally
This work is licensed under the Creative Commons Attribution 3.0 Unported License. Your are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions
Attribution — You must attribute the work by inserting“by E. Della Valle – http://emanueledellavalle.org -
@manudellavalle”
at the end of each reused slide
To view a copy of this license, visithttp://creativecommons.org/licenses/by/3.0/
2
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Me
• Assistant Professor at DEIBPolitecnico di Milano
• Expert in semantic technologies and stream computing
• Brander of stream reasoning: an approach to master the velocity and variety dimension of Big Data• https://scholar.google.com/scholar?
hl=en&q="stream+reasoning"
• 17 years of experience in research and innovation projects
• Startupper: • http://www.fluxedo.com
3
@manudellavalle
http://emanueledellavalle.org
http://streamreasoning.org
http://fluxedo.com
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
• The interoperability problem• The standardization dilemma• One standard does not fit all• Embrace change with semantic technologies• Demo time for Google Knowledge Graph• How this become possible
Agenda
4
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Definitions of Interoperability
• Interoperability• the ability of information and communication technology (ICT) systems
to exchange data and to enable sharing of information and knowledge • Functional interoperability
• Information has to be transmitted reliably between heterogeneous applications
• Semantic interoperability• Transmission must occur without loss of meaning, and thus without
loss of computability• E.g., Semantic Interoperability in healthcare information systems
• It is the ability to share information without loss of computable meaning, across multiple applications concerned with clinical (primary use) and related administrative, financial, and research domains (secondary uses).
5
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Once upon a time …
6
…, in an happy organization, users were happy of the application the IT department prepared for them, but …
application
[…]
… the organization was not alone. Another organization developed a complementary application …
complementary application
[…]
… so, one day, the two organizations decided to integrate the two applications.
Organizationalboundaries
application
[…]
complementary application
[…]
Organizationalboundaries
application
[…]
?
Having much to gain the happy organization decided to invest in a bi-lateral solution
complementary application
[…]
Organizationalboundaries
application
[…]
adapter!
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
… this went on for a while, but …
7
[…]
!
… the more bi-lateral integrations, the sadder the organizations became.
[…]
[…]
[…]
[…]
[…]
[…]
!
!
!!
!!
!!
!?!
!!!!!!
!?!?!?
?!?
?!?! OK!! Good!!! Very Good
!?! Very Good …
?!? Have I done the right thing?
??? Does it make sence??#@ Why am I doying it!!!
Legend
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
… So, they standardized and …
8
[…]
[…]
[…]
[…]
[…]
[…]
[…]
standard
… and they lived happily ever after!
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Well, not really :-( Actually …
9
[…]
[…]
[…]
[…]
[…]
[…]
[…]
??? KEEP CALM
AND
WAIT FOR 1 YEARS 10100
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Why? The Standardization dilemma!
ComprehensiveHandles all use cases
GoodHigh quality
TimelyCompleted quickly
Pick two!
Pick two!
10
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
There are a variety of them
11
Standards are like plumbs
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Over 100 in the Healthcare domain!
AIR ALT AOD AOT BI CCC CCPSS CCS CDT CHV COSTAR CPM CPT CPTSP CSP CST DDB DMDICD10 DMDUMD DSM3R DSM4 DXP FMA
HCDT HCPCS HCPT HL7V2.5 HL7V3.0 HLREL ICD10 ICD10AE ICD10AM ICD10AMAE ICD10CM ICD10DUT ICD10PCS ICD9CM ICF ICF-CY ICPC ICPC2EDUT ICPC2EENG ICPC2ICD10DUT ICPC2ICD10ENG ICPC2P ICPCBAQ ICPCDAN ICPCDUT ICPCFIN ICPCFRE ICPCGER ICPCHEB
ICPCHUN ICPCITA ICPCNOR ICPCPOR ICPCSPA ICPCSWE JABL KCD5 LCH LNC_AD8 LNC_MDS30 MCM MEDLINEPLUS MSHCZE MSHDUT
MSHFIN MSHFRE MSHGER MSHITA MSHJPN MSHLAV MSHNOR MSHPOL MSHPOR MSHRUS MSHSCR MSHSPA MSHSWE MTH MTHCH
MTHHH MTHICD9 MTHICPC2EAE MTHICPC2ICD10AE MTHMST MTHMSTFRE MTHMSTITA NAN NCISEER NIC NOC OMS PCDS PDQ PNDS PPAC PSY QMR RAM RCD RCDAE RCDSA RCDSY SNM SNMI
SOP SPN SRC TKMT ULT UMD USPMG UWDA WHO WHOFRE WHOGER WHOPOR WHOSPA
12
[source: dbooth.org/2014/yosemite/yosemite-project-slides.pdf]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
And they keep changing :-(
13
[Credits: Rafael Richards]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Why?
14
[source http://xkcd.com/927/ ]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
… sometime the variety is required
15
standards are like plumbs
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
One standard does not fit all
Different use cases need need different data, granularity and representations
16
[source: dbooth.org/2014/yosemite/yosemite-project-slides.pdf]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
… thus translation is needed
17
standards are like plumbs
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
And counting on translation between standards is even convenient while working on increasing the comprehensiveness of a standard over time
18
Translation is unavoidable!
Co
mp
reh
en
siv
e
0%
100%
Time
Translation
Standard
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
But be aware of the cost of ad hoc translation!
19
standards are like plumbs
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
…in healthcare costs $30000 Million per year in USA
[source: http://www.calgaryscientific.com/blog/bid/284224/Interoperability-Could- Reduce-U-S-Healthcare-Costs-by-Thirty-Billion]
20
The luck of interoperability …
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
So What?!?
“It is not necessarily the strongest of the
species that survives nor the most intelligent,
but the one that is most responsive to change.”
--- Charles Darwin“The Origin of Species”
21
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Semantic Technologies embrace change
23
subject objectobjectproperty
Proposing a simple data model: RDF
E.g.,
Flexible enough to represent: Tables
Amoxi-cillin
bacterial disease
bacterial disease
treats
Trees Graphs
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Providing a powerful query language: SPARQLE.g., what does Amoxicillin treat?
?x={Bacterial disease, Urinary tract infection, Sinus infection, …}
Flexible enough to query RDF data even without knowing the schemaE.g., can you describe Amoxicillin ?
?p={treats} ?x={Bacterial disease, Urinary tract infection, Sinus infection, …}?p={hasSideEffects} ?x={Diarrhoea}?p={belongsTo} ?x={β-Lactam antibiotic, Penicillin-class Antibacterial}…
Semantic Technologies embrace change
24
Amoxi-cillin ?x?x
treats
Amoxi-cillin ?x?x
?p
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Providing a formal language for conceptual modelling: OWLE.g., Heart
Heart is a muscularorgan that is part ofthe circulatory system
∀x.[ Heart(x)→ MuscolarOrgan(x)∧ ∃y.[isPartOf(x,y )∧ CirculatorySystem(y)]]
OWL is a modular standard that offers different trade-offs OWL-QL OWL-RL OWL-EL
Semantic Technologies embrace change
25
TermsData
Terms
Data Terms
Data
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Semantic Technologies embrace change
26
Standard in OWL
[…]
Ontology Based Data Access as a prototypical solution to interoperability problems
<XML><XML>
Translator Translator Translator
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Semantic Technologies embrace change
27
Standard in OWL
[…]
SPARQL Queries
Ontology Based Data Access as a prototypical solution to interoperability problems
RDBMS <XML><XML>
Translator Translator Translator
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Semantic Technologies embrace change
28
Standard in OWL
Results{ , , }
Ontology Based Data Access as a prototypical solution to interoperability problems
[…]RDBMS <XML><XML>
Translator Translator Translator
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Search for Galileo and look to the right
31
Galileo Galilei AstronomerAstronomer
type
February 15, 1564
February 15, 1564
when born
CallistoCallisto
GanimedeGanimede
discovered
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Let's try a more complex query
32
Galileo Galilei
discovered?x?x
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
“The Semantic Web is not a separate Web, but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
“The Semantic Web”, Scientific American Magazine,
Maggio 2001
Semantic interoperability on the functionally interoperable Web
2001In the begging was the Semantic Web
35
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
2008It gained popularity when Linked Data became standards
36
View the full talk at http://www.ted.com/talks/view/id/484 !
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
2008it was funded by USA, UK and …
37
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
2008Search engine created incentives
[source https://developer.yahoo.com/searchmonkey/siteowner.html ]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
2008Search engine created incentives
Emanuele Della Valle -
@manudellavalle - http://emanueledellavalle.
org
[source https://developers.google.com/structured-data/rich-snippets/ ]
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
•Since Fall 2009•450.000 products •Using RDFa (= RDF embedded in HTML)•Pages with RDFa higher in Google ranking•BestBuy claims 30% more traffic!•Yahoo reports 15% higher click-through rat
2009Best Buy picked them up
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
© 2012 Politecnico di Milano, Emanuele Della Valle
2009Best Buy picked them up
<div rel="v:hasReview"><span property="v:rating" datatype="xsd:string"> 4.8</span> of <span property="v:best">5</span>
<div rel="v:hasReview"><span property="v:rating" datatype="xsd:string"> 4.8</span> of <span property="v:best">5</span>
RDFa
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Google for Nikon+12.3-Megapixel+Digital+SLR+Camerahttps://www.google.com/search?q=Nikon+12.3-Megapixel+Digital+SLR+Camera
2009Best Buy picked them up
enriche
d pages//43
Sponsored Links
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
• Who: Richard MacManus • When: April 15th, 2010• Context: Modigliani’s painting are
scattered all other the world• The challenge: If all museums would have
published their collections as linked data, will it be possible to know the locations of allthe original paintings of Modigliani?
• http://readwrite.com/2010/04/15/the_modigliani_test_semantic_web_tipping_point
2010The Modigliani test for Linked Data
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
The Results of Modigliani test for Linked Data• Who: Atanas Kiryakov (Ontotext AD)• When: April 25th, 2010• How: http://factforge.net/ a “reason-able” view to the web of data• Results: http://bit.ly/ModiglianiTest
http://readwrite.com/2010/04/25/the_modigliani_test_for_linked_data
2010The Modigliani test for Linked Data
Part of my LarKC project http://www.larkc.org/
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Use RDFa with some FB specific vocabulary
og:title - The title of your object, e.g., "The Rock".
og:type - The type of your object, e.g., "movie".
og:image - An image URL
og:url - The permanent ID of your object
og:description - A one to two sentence description of your object.
og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., "IMDb".
2010It went main stream with Facebook Open Graph
http://ogp.me/
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Open Graph Usage Statistics
15 millions sites are using Open Graph! 39% of the top 10,000 sites
2010It went main stream with Facebook Open Graph
[Source: http://trends.builtwith.com/docinfo/Open-Graph-Protocol]
%
40
30
202010 2011 2012 2013 2014 2015
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
•The core vocabulary currently consists of •597 Types•867 Properties•114 Enumeration values
[So
urc
e h
ttp
://b
log
.sch
em
a.o
rg/2
015/
11/s
che
mao
rg-w
hat
s-n
ew.h
tml
]
2011It reached its full potential with schema.org
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Thanks to schema.org also recipe are in the Knowledge Graphs
49
E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Google Knowledge Graph (powered by Semantic Technologies) passes the Modigliani Test
51