Identification Knowledge Graph - Pace...

18
Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor (UMD) and W. Cohen (CMU) ISWC 13

Transcript of Identification Knowledge Graph - Pace...

Page 1: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Knowledge Graph Identification

J. Pujara, H. Miao, and L. Getoor (UMD)and W. Cohen (CMU)

ISWC 13

Page 2: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

● What is the paper about. See video (2min44sec): Knowledge Graph

○ Semantic Web & Knowledge Representation.○ Information Extraction.

● What is the problem.● Approach and method (of the solution):

○ Probabilistic Soft Logic○ KGI using PSL

● Datasets and Result

Today’s talk

Page 3: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

● Machine-readable information.○ Structured data RDF (Triples), OWL.

● Semantic search-engines:○ From info engine to knowledge engine (Knowledge

Graph).

SemWeb & Knowledge Representation

Page 4: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Triples in semantic webSubject-Predicate-Object (subject-property-value).● Triples describe relationships between objects.● Objects can be anything (a webpage, a person, or a book … etc).

○ represented as URIs

Example: Aziz knows Lixin.

uri://people#Aziz

http://xmlns.com/foaf/0.1/knows

uri://people#Lixin

Knows(Aziz, Lixin)

Page 5: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Input:● Initial ontology (categories and relation):

○ e.g. sportsTeam and playsOnTeam(athlete, sportsTeam)● 10-15 seed examples of each.● 500 million web pages + access to the web.

Ongoing output:● Extract new instances. See: twitter.

○ Learn to read better.

Extracting Information (NELL Project1)

Never-Ending Language Learning by CMU.

Page 6: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

ConceptsAtoms (Atomic concepts):● teamPlaysSport(sportsTeam, sport)

Ground atoms (Instances):● teamPlaysSport(Nets, Basketball)

Ontology constraints:● DOM(teamPlaysSport, sportsTeam)● RNG(teamPlaysSport, sport)

Page 7: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Knowledge Graph

KGs contain 3 type of facts about:● Entities, ENT(E)● Entities labels, LBL(E,L),● Relations, REL(E1, E2, R).

Page 8: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

What is the problem?Massive collections of interrelated facts (with a pool of noisy extractions).

Page 9: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

What needed

Page 10: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

KGI SolutionGraph Identification as Joint Reasoning:Probabilistic Soft Logic (PSL)

The ultimate goal is to:● identify a set of atoms.

Page 11: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Proposed solution (approach)

Incorporates 3 components:

1. Uncertain extractionsCandidate facts, and their extraction confidences.

2. Entity resolution (identify co-referent).

3. Ontological constraints.

Page 12: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Method: Probabilistic Soft Logic

Declarative Language based on logics to express collective probabilistic inference problems.1

● Predicate = relationship or property.● Atom = (continuous) random variables.● Rule = capture dependency or

constraint.● Set = define aggregates.

PSL is for modeling domains such as:● Probabilistic (e.g. link prediction)● Relational (e.g. ontology alignment)

PSL combines 2 theories:● 1st-order logic.● Probabilistic graphical models.

PSL uses:● “Soft” logic (as its logical components).● Markov networks (as its statistical

model).

From: intro to PSL.

Page 13: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Background: PSL

Soft logic:

“Softening” of logical formulas makes the inference problem a polynomial-time (convex) optimization rather than a combinatorial (NP-hard) one.

A ^ B = max(A + B - 1, 0)A v B = min(A + B, 1) ~A = 1 - A

Lukasiewicz t-norm

similarNames(X, Y) ⇒ sameEntity(X, Y)

Page 14: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

The 3 components in PSL program

Uncertain Extractions:

Entity Resolution:

Ontology Constraints:

Page 15: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

KGI Using PSL

Then probability distribution over uncertain knowledge graphs, G:

If I is an interpretation and r is a ground instance of a rule, then the distance to satisfaction ør(I) of r is simply the soft-truth value from the Lukasiewicz t-norm.

Where,Each ground rule r, has a weighted potential wrI: Interpretation (assignment of soft-truth values to ground atoms)Z: normalization constant (the partition function)

See: Markov random field (logistic model)

Page 16: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Datasets & results1

Evaluation on NELL dataset from iteration 165:● 1.7M candidate facts.● 70K ontological constraints.

Datasets and source code are github

Page 17: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Comparison with variants of KGI

Page 18: Identification Knowledge Graph - Pace Universitywebpage.pace.edu/aa10212w/course/2014-spring/CS702/2.PSL-KGI.… · Knowledge Graph Identification J. Pujara, H. Miao, and L. Getoor

Questions ?