Identification Knowledge Graph - Pace...
Transcript of Identification Knowledge Graph - Pace...
Knowledge Graph Identification
J. Pujara, H. Miao, and L. Getoor (UMD)and W. Cohen (CMU)
ISWC 13
● What is the paper about. See video (2min44sec): Knowledge Graph
○ Semantic Web & Knowledge Representation.○ Information Extraction.
● What is the problem.● Approach and method (of the solution):
○ Probabilistic Soft Logic○ KGI using PSL
● Datasets and Result
Today’s talk
● Machine-readable information.○ Structured data RDF (Triples), OWL.
● Semantic search-engines:○ From info engine to knowledge engine (Knowledge
Graph).
SemWeb & Knowledge Representation
Triples in semantic webSubject-Predicate-Object (subject-property-value).● Triples describe relationships between objects.● Objects can be anything (a webpage, a person, or a book … etc).
○ represented as URIs
Example: Aziz knows Lixin.
uri://people#Aziz
http://xmlns.com/foaf/0.1/knows
uri://people#Lixin
Knows(Aziz, Lixin)
Input:● Initial ontology (categories and relation):
○ e.g. sportsTeam and playsOnTeam(athlete, sportsTeam)● 10-15 seed examples of each.● 500 million web pages + access to the web.
Ongoing output:● Extract new instances. See: twitter.
○ Learn to read better.
Extracting Information (NELL Project1)
Never-Ending Language Learning by CMU.
ConceptsAtoms (Atomic concepts):● teamPlaysSport(sportsTeam, sport)
Ground atoms (Instances):● teamPlaysSport(Nets, Basketball)
Ontology constraints:● DOM(teamPlaysSport, sportsTeam)● RNG(teamPlaysSport, sport)
Knowledge Graph
KGs contain 3 type of facts about:● Entities, ENT(E)● Entities labels, LBL(E,L),● Relations, REL(E1, E2, R).
What is the problem?Massive collections of interrelated facts (with a pool of noisy extractions).
What needed
KGI SolutionGraph Identification as Joint Reasoning:Probabilistic Soft Logic (PSL)
The ultimate goal is to:● identify a set of atoms.
Proposed solution (approach)
Incorporates 3 components:
1. Uncertain extractionsCandidate facts, and their extraction confidences.
2. Entity resolution (identify co-referent).
3. Ontological constraints.
Method: Probabilistic Soft Logic
Declarative Language based on logics to express collective probabilistic inference problems.1
● Predicate = relationship or property.● Atom = (continuous) random variables.● Rule = capture dependency or
constraint.● Set = define aggregates.
PSL is for modeling domains such as:● Probabilistic (e.g. link prediction)● Relational (e.g. ontology alignment)
PSL combines 2 theories:● 1st-order logic.● Probabilistic graphical models.
PSL uses:● “Soft” logic (as its logical components).● Markov networks (as its statistical
model).
From: intro to PSL.
Background: PSL
Soft logic:
“Softening” of logical formulas makes the inference problem a polynomial-time (convex) optimization rather than a combinatorial (NP-hard) one.
A ^ B = max(A + B - 1, 0)A v B = min(A + B, 1) ~A = 1 - A
Lukasiewicz t-norm
similarNames(X, Y) ⇒ sameEntity(X, Y)
The 3 components in PSL program
Uncertain Extractions:
Entity Resolution:
Ontology Constraints:
KGI Using PSL
Then probability distribution over uncertain knowledge graphs, G:
If I is an interpretation and r is a ground instance of a rule, then the distance to satisfaction ør(I) of r is simply the soft-truth value from the Lukasiewicz t-norm.
Where,Each ground rule r, has a weighted potential wrI: Interpretation (assignment of soft-truth values to ground atoms)Z: normalization constant (the partition function)
See: Markov random field (logistic model)
Datasets & results1
Evaluation on NELL dataset from iteration 165:● 1.7M candidate facts.● 70K ontological constraints.
Datasets and source code are github
Comparison with variants of KGI
Questions ?