Post on 11-May-2015
description
The Knowledge Reengineering Bottleneck
Rinke Hoekstrarinke.hoekstra@vu.nl
VU University Amsterdam/University of Amsterdam
vrijdag 24 februari 12
Knowledge Engineering“Critical scientific problem [...] successful applied AI requires that knowledge move from the heads of experts into programs”FEIGENBAUM, E. A. (1984), Knowledge Engineering. Annals of the New York Academy of Sciences, 426: 91–107. doi: 10.1111/j.1749-6632.1984.tb16513.x
vrijdag 24 februari 12
‣ The lack of adequate and appropriate hardware‣ Lack of cumulation of AI methods and techniques‣ Shortage of trained knowledge engineers‣ The problem of knowledge acquisition‣ The development gap
Problems of Knowledge Engineering
vrijdag 24 februari 12
Knowledge Acquisition BottleneckFEIGENBAUM, E. A. (1984), Knowledge Engineering. Annals of the New York Academy of Sciences, 426: 91–107. doi: 10.1111/j.1749-6632.1984.tb16513.x
“The problem of knowledge acquisition is the critical bottleneck problem in artificial intelligence”
vrijdag 24 februari 12
The Dark Ages
vrijdag 24 februari 12
Knowledge ElicitationRepertory GridsThink Aloud MethodCardsorting...
vrijdag 24 februari 12
Knowledge ElicitationRepertory GridsThink Aloud MethodCardsorting...
MYCIN and GUIDONKnowledge Types
vrijdag 24 februari 12
Knowledge ElicitationRepertory GridsThink Aloud MethodCardsorting...
CommonKADSEngineering MethodologyProblem Solving MethodsDomain Models
MYCIN and GUIDONKnowledge Types
vrijdag 24 februari 12
Knowledge ElicitationRepertory GridsThink Aloud MethodCardsorting...
CommonKADSEngineering MethodologyProblem Solving MethodsDomain Models
Ontolingua“Explicit specification of a shared conceptualization”
Sharing ontologies
MYCIN and GUIDONKnowledge Types
vrijdag 24 februari 12
How to build the right ontology?
vrijdag 24 februari 12
Specify Guidelines
Identify Purpose and
Scope
Motivating Scenarios
Competency Questions
Ontology Capture
Ontology Coding
Ontology Integration
Evaluation
DocumentationMethodologiesMiddle Out ApproachUschold & Gruninger
METHONTOLOGYKACTUSSENSUS(KA)2
How to build the right ontology?
vrijdag 24 februari 12
Specify Guidelines
Identify Purpose and
Scope
Motivating Scenarios
Competency Questions
Ontology Capture
Ontology Coding
Ontology Integration
Evaluation
DocumentationMethodologiesMiddle Out ApproachUschold & Gruninger
METHONTOLOGYKACTUSSENSUS(KA)2
Generic Ontology
Core Ontology
Domain Ontology
Top Ontology
Re
pre
se
nta
tio
n O
nto
log
y Ontology TypesTopGenericApplication
FoundationDomain
Core
How to build the right ontology?
vrijdag 24 februari 12
Specify Guidelines
Identify Purpose and
Scope
Motivating Scenarios
Competency Questions
Ontology Capture
Ontology Coding
Ontology Integration
Evaluation
DocumentationMethodologiesMiddle Out ApproachUschold & Gruninger
METHONTOLOGYKACTUSSENSUS(KA)2
Generic Ontology
Core Ontology
Domain Ontology
Top Ontology
Re
pre
se
nta
tio
n O
nto
log
y Ontology TypesTopGenericApplication
FoundationDomain
Core
How to build the right ontology?
PrinciplesOntoClean
Ontology vs. Epistemology
vrijdag 24 februari 12
Specify Guidelines
Identify Purpose and
Scope
Motivating Scenarios
Competency Questions
Ontology Capture
Ontology Coding
Ontology Integration
Evaluation
DocumentationMethodologiesMiddle Out ApproachUschold & Gruninger
METHONTOLOGYKACTUSSENSUS(KA)2
Ontology ReuseMerging & AlignmentModularizationOntology Design Patterns
Generic Ontology
Core Ontology
Domain Ontology
Top Ontology
Re
pre
se
nta
tio
n O
nto
log
y Ontology TypesTopGenericApplication
FoundationDomain
Core
How to build the right ontology?
PrinciplesOntoClean
Ontology vs. Epistemology
vrijdag 24 februari 12
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/vrijdag 24 februari 12
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
vrijdag 24 februari 12
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
0
100
200
300
400
1 m
ei 2
007
8 ok
t. 20
07
7 no
v. 20
07
10 n
ov. 2
007
28 fe
b. 2
008
31 m
rt. 2
008
18 s
ep. 2
008
5 m
rt. 2
009
27 m
rt. 2
009
14 ju
l. 20
09
22 s
ep. 2
010
19 s
ep. 2
011
23 fe
b. 2
012
vrijdag 24 februari 12
Performance
0
70
140
210
280
350
420
490
560
630
700
0 10 20 30 40 50 60 70 80 90 100
Thro
ug
hp
ut
(Ktr
iple
s/s
ec)
Input size (Billions of statements)
BigOWLIMOracle 11gDAML DBBigDataWebPIE
We are here!!
Monday 10 May 2010
Urbani J., Kotoulas, S., Maaseen J., van Harmelen, F. & Bal, H. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of ESWC 2010
2009: WebPIE
vrijdag 24 februari 12
Performance
0
70
140
210
280
350
420
490
560
630
700
0 10 20 30 40 50 60 70 80 90 100
Thro
ug
hp
ut
(Ktr
iple
s/s
ec)
Input size (Billions of statements)
BigOWLIMOracle 11gDAML DBBigDataWebPIE
We are here!!
Monday 10 May 2010
Urbani J., Kotoulas, S., Maaseen J., van Harmelen, F. & Bal, H. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of ESWC 2010
2011: QueryPIE Backward-chaining inference at query-time, over 1B triples,
in milliseconds, on just 8 parallel machines.Pre-computation in 8-300sec against 1-3 hours in WebPIE
2009: WebPIE
vrijdag 24 februari 12
Dataset Size Terminological Closure Full Closure Ratio
FactForge 862M 89 sec 2h45min 1:111
LinkedLifeData 649M 332 sec 1h5min 1:11
LUBM 1.1B 8 sec 1h15min 1:562
QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases. Urbani, J.; Harmelen, F. van ; Schlobach, S.; and Bal, H. 2011. In Proceedings of ISWC 2011, Volume 5823, 730-745, Springer.
vrijdag 24 februari 12
Knowledge Executable Models
KnowledgeTask Independent Domain Knowledge
Knowledge“Semantic” Data
Knowledge SharingReusable System
Components
Knowledge SharingOntology Reuse
Knowledge SharingData Interoperability
1980 - 1995 1995 - 2005 2005 - now
vrijdag 24 februari 12
Knowledge Reengineering BottleneckThe difficulty of the correct and continuous use of
preexisting knowledge for a new task
vrijdag 24 februari 12
Challenge 1Data Dependency
vrijdag 24 februari 12
Ontology Alignment Evaluation Initiative
vrijdag 24 februari 12
Design Patterns
vrijdag 24 februari 12
“Data” Driven Knowledge Engineering
18/02/2012 Semantic Web cube
1/2www.w3.org/Icons/SW/sw-cube-v.svg
vrijdag 24 februari 12
“Data” Driven Knowledge Engineering
18/02/2012 Semantic Web cube
1/2www.w3.org/Icons/SW/sw-cube-v.svg
vrijdag 24 februari 12
Challenge 2Complexity
vrijdag 24 februari 12
Neats Scruffies
VS
vrijdag 24 februari 12
Use
Reuse
Slide by Frank van Harmelen, ISWC 2011 Keynote, http://www.cs.vu.nl/~frankh/spool/ISWC2011Keynote/
Use = 1 - Reuse
vrijdag 24 februari 12
Challenge 3Limited Control
vrijdag 24 februari 12
Data is Dirty
vrijdag 24 februari 12
Data is DirtyVerbose
vrijdag 24 februari 12
Data is DirtyVerboseInconsistent
vrijdag 24 februari 12
Data is DirtyVerboseInconsistentRedundant
vrijdag 24 februari 12
Data is DirtyVerboseInconsistentRedundantDisconnected
vrijdag 24 februari 12
Data is DirtyVerboseInconsistentRedundantDisconnectedStale
vrijdag 24 februari 12
vrijdag 24 februari 12
Semantically-Interlinked Online Communities
vrijdag 24 februari 12
Semantically-Interlinked Online Communities
vrijdag 24 februari 12
Pedantic Webhttp://pedantic-web.org
vrijdag 24 februari 12
40.745.554.078 Triples!
vrijdag 24 februari 12
40.745.554.078 Triples!(1.6 Billion)
vrijdag 24 februari 12
Open PHACTS
Data2Semantics
LOD Around the Clock
vrijdag 24 februari 12
Challenge 4Increasing Importance
vrijdag 24 februari 12
Semantic Web Good News Quizhttp://slideshare.net/Frank.van.Harmelen/semantic-web-good-news
vrijdag 24 februari 12
Semantic Web Good News Quizhttp://slideshare.net/Frank.van.Harmelen/semantic-web-good-news
vrijdag 24 februari 12
‣ New stakeholders
‣ No more fooling around
‣ Scary stuff...?
vrijdag 24 februari 12
‣ Bridging the development gap
‣ Data publishing licenses
‣ Access policies
‣ Attribution
‣ “Data Hoarding”
vrijdag 24 februari 12
‣ The lack of adequate and appropriate hardware‣ Lack of cumulation of AI methods and techniques‣ Shortage of trained knowledge engineers‣ The problem of knowledge acquisition‣ The development gap
The KnowledgeReengineering Bottleneck
VV
??
?
vrijdag 24 februari 12
http://www.webont.org/owled/2012@OWLED2012Worksh
vrijdag 24 februari 12