The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work...

The Semantic Web:there and back again

Tim FininUniversity of Maryland, Baltimore County

Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

http://ebiq.org/r/353

LOD 123: Making the semantic web easier to use

Tim FininUniversity of Maryland, Baltimore County

Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic Web: then and now

• Ten years ago the we developed complex ontologies used to encode and reason over small datasets of 1000s of facts

• Recently the focus has shifted to using simple ontologies and minimal reasoning over very large datasets of 100s of millions of facts

• Major companies are moving: Google Know-ledge Graph, Facebook Open Graph, Microsoft Satori, Apple Siri KB, IMB Watson KB Linked open data or “Things, not strings”

Linked Open Data (LOD)• Linked data is just RDF data, lots of it,

with a small schema• RDF data is a graph of triples (subject, predicate

– URI URI String: dbr:Barack_Obama dbo:spouse “Michelle Obama”

– URI URI URI: dbr:Barack_Obama dbo:spouse dbpedia:Michelle_Obama

• Best linked data practice prefers 2nd pattern, using nodes rather than strings for “entities”– Things, not strings!

• Linked open data is just linked data freely acces-sible on the Web along with their ontologies

Semantic web technologies allow machines to share data and knowledge using common web language and protocols.

~ 1997

Semantic Web

Semantic Web beginning

Use Semantic Web Technology to publish shared data & knowledge

Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

LOD beginning

LOD growing

… and growing

Linked Open Data

LOD is the new Cyc: a common source of background

knowledge

…growing faster

Linked Open Data

2011: 31B facts in 295 datasets interlinked by 504M assertions on ckan.net

LOD is the new Cyc: a common source of background

knowledge

Exploiting LOD not (yet) Easy

• Publishing or using LOD data hasinherent difficulties for the potential user

– It’s difficult to explore LOD data and to query it for answers

– It’s challenging to publish data using appropriate LOD vocabularies & link it to existing data

• Problem: O(104) schema terms, O(1011) instances

• I’ll describe two ongoing research projects that are addressing these problems

GoRelations:Intuitive Query System

for Linked Data

Research with Lushan Han

http://ebiq.org/j/93

Dbpedia is the Stereotypical LOD•DBpedia is an important example of Linked Open Data–Extracts structured data from Infoboxes in Wikipedia –Stores in RDF using custom ontologies Yago terms

•The major integration point for the entire LOD cloud•Explorable as HTML, but harder to query in SPARQL

DBpedia

Browsing DBpedia’s

Mark Twain

Why it’s hard to query LOD• Querying DBpedia requires a lot of a user– Understand the RDF model– Master SPARQL, a formal query language– Understand ontology terms: 320 classes & 1600 properties !– Know instance URIs (>2M entities !)– Term heterogeneity (Place vs. PopulatedPlace)

• Querying large LODsets overwhelming

• Natural languagequery systems stilla research goal

• Let users with a basic understanding of RDF query DBpedia and other LOD collections– Explore what data is in the system– Get answers to question– Create SPARQL queries for reuse or adaptation

• Desiderata– Easy to learn and to use– Good accuracy (e.g., precision and recall)– Fast

Key Idea

Structured keyword queries reduce problem complexity:

– User enters a simple graph, and– Annotates the nodes and arcs with

words and phrases

Structured Keyword Queries

• Nodes are entities and links binary relations• Entities described by two unrestricted terms:

name or value and type or concept• Outputs marked with ? • Compromise between a natural language Q&A

system and formal query–Users provide compositional structure of the question–Free to use their own terms to annotate structure

Translation – Step Onefinding semantically similar ontology terms

For each graph concept/relation, generate k most semantically similar ontology classes/properties

Lexical similarity metric based on distributional similarity, LSA, and WordNet

Semantic similarity: http://bit.ly/SEMSIM

Semantic Textual Similarity task• 2013 lexical and computational semantics conference• Do two sentences have same meaning (0…5)

1: “The woman is playing the violin” vs. “The young lady enjoys listening to the guitar”4: "In May 2010, the troops attempted to invade Kabul” vs. "The US army invaded Kabul on May 7th last year, 2010"

• 2012: 35 teams, 88 runs, 2013: 36 teams, 89 runs• 2250 sentence pairs

from four domains• Our three runs

#1, #2 and #4

Dataset PairingWords Galactus Saiyan

Headlines (750 pairs) 0.7642 (3) 0.7428 (7) 0.7838 (1)

OnWN (561 pairs) 0.7529 (5) 0.7053 (12) 0.5593 (36)

FNWN (189 pairs) 0.5818 (1) 0.5444 (3) 0.5815 (2)

SMT (750 pairs) 0.3804 (8) 0.3705 (11) 0.3563 (16)

Weighted mean 0.6181 (1) 0.5927 (2) 0.5683 (4)

• Assemble best interpretation using statistics of the data

• Use pointwise mutual informa-tion (PMI) between RDF terms in the LOD collection

Measures degree to which two RDF terms co-occur in knowledge base

• In a good interpretation, ontology terms associate like their corresponding user terms connect in the query

Translation – Step Twodisambiguation algorithm

Three aspects are combined to derive an overall goodness measure for each candidate interpretation

Translation – Step Twodisambiguation algorithm

Joint disam-biguation

Resolvingdirection

Link reason-ableness

Translation resultConcepts: Place => Place, Author => Writer, Book => BookProperties: born in => birthPlace, wrote => author (inverse direction)

The translation of a semantic graph query to SPARQL is straightforward given the mappings

SPARQL Generation

Concepts•Place => Place•Author => Writer•Book => Book

Relations•born in => birthPlace•wrote => author

Evaluation• 33 test questions from 2011 Workshop on Question

Answering over Linked Data answerable using DBpedia• Three human subjects unfamiliar with DBpedia translated

the test questions into semantic graph queries• Compared with two top natural language QA systems:

PowerAqua and True Knowledge

Current work

• Baseline system works well for Dbpedia; we’re testing a second use case now

• Current work– Better entity matching– Relaxing the need for type information– A better Web interface with user feedback & advice

• See http://ebiq.org/93 for more information & try our alpha version at http://ebiq.org/GOR

Generating Linked Databy Inferring the

Semantics of Tables

Research with Varish Mulwad

http://ebiq.org/j/96

Goal: Table => LOD*

Name Team Position Height

Michael Jordan Chicago Shooting guard 1.98

Allen Iverson Philadelphia Point guard 1.83

Yao Ming Houston Center 2.29

Tim Duncan San Antonio Power forward 2.11

http://dbpedia.org/class/yago/NationalBasketballAssociationTeams

http://dbpedia.org/resource/Allen_Iverson Player height in meters

dbprop:team

* DBpedia

Goal: Table => LOD*

Name Team Position HeightMichael Jordan Chicago Shooting guard 1.98

Allen Iverson Philadelphia Point guard 1.83

Yao Ming Houston Center 2.29

Tim Duncan San Antonio Power forward 2.11

@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbo: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .

"Name"@en is rdfs:label of dbo:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .

"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbo:BasketballPlayer .

"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .

RDF Linked Data

All this in a completely automated way* DBpedia

Tables are everywhere !! … yet …

The web – 154 million high quality relational tables

Fewer than 1% of the 400K tables at data.gov have rich semantic schemas

Sampling Acronym detection

Pre-processing modules

Query and generate initial mappings

Generate Linked RDF Verify (optional) Store in a knowledge base & publish as LOD

Joint Inference/Assignment

A Domain Independent Framework

Query and Rank

Rank(String Similarity,

Popularity)Chicago

Boston

Allen Iverson 1. Chicago_Bulls2. Chicago3. Judy_Chicago

possible entitiesfor Chicago

Can be replaced by Domain Specific / other LOD knowledge bases

Generating candidate ‘types’ for Columns

Chicago

Philadelphia

Houston

San Antonio

Instance

1. Chicago Bulls2. Chicago3. Judy Chicago

{dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams }

{dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. }

{……………………………………………………………. }

dbpedia-owl:Place, dbpedia-owl:City, yago:WomenArtist, yago:LivingPeople, yago:NationalBasketballAssociationTeams, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film ….

Joint Inference / Assignment

A graphical model for tablesJoint inference over evidence in a table

C1 C2 C3

Chicago

Philadelphia

Houston

San Antonio

Instance

Parameterized Graphical Model

C1 C2C3

𝝍𝟓

R11 R12 R13 R21 R22 R23 R31 R32 R33

Function capturing affinity between column headers and row values

Row value

Variable Node: Column header

Captures interaction between column headers

Factor Node

Captures interaction between row values

Standard message passingP(C1, C2, C3, R11, R12 ,R13, R21, R22, R23, R31, R32, R33)Joint Assignment :

P(C1, R11, R12 ,R13)

P(C3, R31, R32, R33)

P(C2,R21, R22, R23)

P(R31, R32, R33)

Graphical Models : Exploit Conditional

Independences

Still …

C1 R11 R12 R13 Val

4 Variables; Each having

25 options -- 390,625

entries !

Semantic message passing

Michael_I_Jordan Chicago_BullsShooting_guard

“Change”Related to Chicago_Bulls

“No Change”“No Change”

R11:[Michael_I_Jordan]

R12:[Yao_Ming]

R13:[Allen_Iverson]

R21:[Chicago_Bulls]

R31:[Shooting_Guard]

……

C1:[BasketballPlayer] C2:[NBATeam] C3:[BasketBallPositions]

Yao_Ming Allen_Iverson

BasketballPlayer

NBATeam BasketBallPositions

“Change”BasketBall

Player

“No Change”

“No Change”“No Change”

“No Change”

Inference – ExampleR11:

[Michael_I_Jordan]

R12:[Allen_Iverson] R13:[Yao_Ming]

C1:[Name]

(Michael_I_Jordan, Yao_Ming, Allen_Iverson)

“Change”“No Change”“No Change”

“BasketBallPlayer”

Michael Jordan

1. Michael_I_Jordan (Professor)2. ….. 3. Michael_Jordan (BasketballPlayer)

Michael_I_Jordan Allen_Iverson Yao_Ming

“BasketBallPlayer”

Column header – row value agreement[Michael_I_Jordan, Allen_Iverson, Yao_Ming]

WomenArtistCityPopulatedPlaceAthleteFilm

LivingPeopleGeoPopulatedPlaceBasketBallPlayerArtWork

Michael_I_Jordan

Allen_Iverson

Yao_Ming BasketballPlayerAtheleteLivingPeople

LivingPeopleAI_Researchers +1

+1 1. Athlete2. City 3. …

1.LivingPeople2. BasketBallPlayer3.GeoPopulatedPlace….

Yago Tie-breaker/Re-order : Choose more ‘descriptive’ class. E.g. BasketBallPlayer better than LivingPeople

ClassGranularityScore = 1-[]

1: Majority Voting

2: Choose the top Class

Top Yago : BasketBallPlayer TopDBpedia : Athelete

Column header – row value agreementtopClassScore = (numberOfVotes)/(numberofRows)

Compute topScoreYago & topScoreDBpedia

(both)Score < Threshold(topScoreYago || topScoreDBpedia) >= Threshold

Check for AlignmentIs Athelete sub/superClass of BasketBallPlayer ?

Columnn Header Annotation = BasketBallPlayer, Athlete

Michael_I_Jordan

Allen_Iverson

Yao_Ming BasketballPlayerAtheleteLivingPeople

LivingPeopleAI_Researchers Change

No - Change

Update Column Header Annotation = “No-Annotation”

Update row value entity annotations

Michael Jordan

1. Michael_I_Jordan 2. ….. 3. Michael_Jordan ….

LivingPeopleAI_Researchers

BasketBallPlayerAthlete

𝝍𝟑 “CHANGE”

Entity Class : BasketBallPlayer or

Athlete

R11 Michael Jordan Michael_Jordan

Candidate Entities for R11

Evaluation

• Dataset of 80 tables (Wikipedia tables; part of larger dataset released by IIT-Bombay)

• Evaluated Column Header Annotation Accuracy– How good was the mapping Team to

NationalBasketballAssociationTeams• Evaluated Entity Linking Accuracy

– Mapping Michael Jordan to Michael_Jordan

Column Header Annotation Accuracy

• System produced a ranked a list of Yago & DBpedia classes

• Human judges evaluated each class• For precision, judges scored each class

• 1 if the class was accurate• 0.5 if the class ok, but not best (e.g., Place vs. City)• 0 if it was incorrect

• For Recall, score 1 if accurate/correct, 0 for incorrect

Accurate

Incorrect

Top K classes, F-measure

Yago rank 1

Yago rank 2

Yago rank 3

dbp rank 1

dbp rank 2

dbp rank 3

GOOG IIT-B

F-Measure 0.688360450563

0.487568135310

0.438263244010

0.677154394707

0.626750158305

0.612644415917

0.67 0.56

Column Header Annotations

F-Measure

Entity linking accuracy

http://dbpedia.org/resource/Allen_IversonAllen Iverson

Correctly Linked Entities

Incorrectly Linked Entities

Total Entities 3981

Accuracy : 75.91 %

Other Challenges• Using table captions and other text is

associated documents to provide context• Size of some data.gov tables (> 400K rows!)

makes using full graphical model impractical– Sample table and run model on the subset

• Achieving acceptable accuracy may require human input– 100% accuracy unattainable automatically– How best to let humans offer advice and/or

correct interpretations?

Final Conclusions

• Linked data great for sharing structured and semi-structured data– Backed by machine-understandable semantics– Uses successful Web languages and protocols

• Generating and exploring linked data resources is challenging– Schemas are too large, too many URIs

• New tools mapping tables to linked data and translating structured natural language queries reduce the barriers

http://ebiq.org/

The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work...

Documents

Transcript of The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work...

LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012 .

Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County

Created by: Nanjing University, Lushan Scenic Administration & the Kuling American School Assoc.

Swoogle's Metadata about the Semantic Webebiquity.umbc.edu/_file_directory_/papers/279.pdf · Swoogle's Metadata about the Semantic Web Lushan Han, Li Ding, Rong Pan, Tim Finin May

Change detection of land cover in Lushan Based on TM ...gis.nju.edu.cn/manage/uploadfile/File/2013122718583564927.pdf · maximum likelihood classification, ... the image contrast

Using linked data to interpret tables Varish Mulwad, Tim Finin, Zareen Syed and Anupam Joshi University of Maryland, Baltimore County November 8, 2010.

E Aftershock Observation and Analysis of the 2013 M 7.0 Lushan Earthquake · E Aftershock Observation and Analysis of the 2013 Ms 7.0 Lushan Earthquake by Lihua Fang, Jianping Wu,

Chapter 15bufordwhap.pbworks.com/w/file/fetch/70258421/bentley4_ppt_ch15.pdf · favorite concubine ! 775 rebellion under An Lushan, former military ... 15 Population Growth 0 20 40

Varish Mulwad Master’s Thesis Defense Advisor: Dr. Tim Finin June 29, 2010

Paradigm Evolution of the Traditional Chinese Medicine · PDF file · 2015-09-07Paradigm Evolution of the . Traditional Chinese Medicine ... Lushan, China as a hermit. Dong never

Co- and post-seismic surface deformation and gravity ... · fault system in and around Lushan County, Sichuan Province, China. This devastating earthquake killed hun-dreds of people,

Investigation of Lushan earthquake ionosphere VTEC ... · RESEARCH PAPER Investigation of Lushan earthquake ionosphere VTEC anomalies based on GPS data Weiping Jiang • Yifang Ma

Semantic Message Passing for Generating Linked … Message Passing for Generating Linked Data from Tables? Varish Mulwad, Tim Finin and Anupam Joshi University of Maryland, Baltimore

Characteristics of subsurface density variations before ... · RESEARCH PAPER Characteristics of subsurface density variations before the 4.20 Lushan M S7.0 earthquake in the Longmenshan

Status Evaluation & Analysis on Equipments of Power ... · of Power Transmission & Distribution in Lushan Earthquake ... Code for Design of Seismic of ... Tipical Isolation & Dissipation

Lushan Mountain Evergreen broad-leaved forest Evergreen deciduous broad-leaved forest Deciduous evergreen broad-leaved forest Deciduous broad-leaved.

Lushan MS7.0 earthquake: A blind reserve-fault event · S7.0 earthquake there are several imbricate active reverse faults lying from northwest to southeast, namely the Gengda-Longdong,

Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.

mai.group.whut.edu.cnmai.group.whut.edu.cn/chs/lw/2017/201802/P02018022476086… · Web viewa Powder Metallurgy Research Institute, Central South University, 932, Lushan South Road,

Edmonton County School · ancroft, Joshua 92 100 oakye Yiadom, Spencer 92 100 amilo Tayaca, Nicole 92 100 Kawagga, Thor 92 100 Mayo, Nicholas 92 100 Ntim, Nana 92 100 Persaud, Varish