Graph and Knowledge Graph Representation Learning · •Google Knowledge Graph •Amazon Product...

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1

CSE 6240: Web Search and Text Mining. Spring 2020

Graph and Knowledge Graph Representation Learning

Prof. Srijan Kumarhttp://cc.gatech.edu/~srijan


Today’s Lecture• Embedding entire graphs • Introduction to Knowledge Graphs• Embeddings in Knowledge Graphs

– TransE– TransR


Embedding Entire Graphs

• Goal: How to embed an entire graph 𝐺?

• Tasks:– Classifying toxic vs. non-toxic molecules– Identifying anomalous graphs

𝒛$


Approach #1Simple idea: • Run a standard graph embedding technique on

the (sub)graph 𝐺• Then just sum (or average) the node

embeddings in the (sub)graph 𝐺

• Used by Duvenaud et al., 2016 to classify molecules based on their graph structure– Convolutional Networks on Graphs for Learning

Molecular Fingerprints. NeurIPS 2015

𝑧$ = '𝑧(

�

(∈$


Approach #2• Idea: Introduce a “virtual node” to represent

the (sub)graph and run a standard graph embedding technique

• Proposed by Li et al., 2016 as a general technique for subgraph embedding– Gated Graph Sequence Neural Networks. ICLR 2016


Approach #3• Represent a graph as a

distribution/set of walks on that graph

• Anonymous Walk Embeddings:

– States in anonymous walk correspond to the index of the first time we visited the node in a random walk

– Anonymous Walk Embeddings, ICML 2018


Number of Walks GrowsThe number of anonymous walks grows exponentially:

– There are 5 anon. walks 𝑎, of length 3: 𝑎-=111, 𝑎.=112, 𝑎/= 121, 𝑎0= 122, 𝑎1= 123


Idea #1: Anonymous Walks• Enumerate all possible anonymous walks 𝑎,

of 𝑙 steps and record their counts• Represent the graph as a probability

distribution over these walks• For example:

– Set 𝑙 = 3– Then we can represent the graph as a 5-dim

vector• Since there are 5 anonymous walks 𝑎,of length 3:

111, 112, 121, 122, 123– 𝑍$[𝑖] = probability of anonymous walk 𝑎, in 𝐺


Idea #2: Learn Walk Embeddings

Learn embedding 𝒛𝒊of every anonymous walk 𝒂𝒊• The embedding of a graph 𝐺 is then

sum/avg/concatenation of walk embeddings z,


Idea #2: Learn Walk Embeddings

How to embed walks?• Idea: Embed walks such

that the next walk starting from the same node can be predicted– Set walk embedding z, such

that we maximize𝑃 𝑤>? 𝑤>@A? , … ,𝑤>? = 𝑓(𝑧)• Where 𝑤>? is a 𝑡-th random

walk starting at node 𝑢– Similar to the word2vec idea


Idea #2: Learn Walk Embeddings• Run 𝑻 different random walks from 𝒖

each of length 𝒍: 𝑁M 𝑢 = 𝑎-?, 𝑎.? …𝑎N?

– Let 𝑎, be its anonymous version of walk 𝑤,

• Learn to predict walks that co-occur in 𝚫-size window


Idea #2: Learn Walk Embeddings• Estimate embedding 𝒛𝒊 of anonymous

walk 𝒂𝒊 of 𝒘𝒊:

max1𝑇'log𝑃(𝑎>|𝑎>@A, … , 𝑎>@-)

N

>ZA

where: Δ = context window size• 𝑃 𝑤> 𝑤>@A,… ,𝑤>@- = \]^(_ `a )

∑ \]^(_(`c))dc

, i.e., softmax over all

walks• 𝑓(𝑎>) = 𝑏 + 𝑈 ⋅ -

A∑ 𝑧,A,Z-

– where 𝑏 ∈ ℝ, 𝑈 ∈ ℝj, 𝑧, is the embedding of 𝑎, (anonymized version of walk 𝑤,)


Summary of Graph Embeddings

We discussed 3 ideas to graph embeddings:• Approach 1: Embed nodes and sum/average

them• Approach 2: Create super-node that spans

the (sub) graph and then embed that node • Approach 3: Anonymous Walk Embeddings

– Idea 1: Represent the graph via the distribution over all the anonymous walks

– Idea 2: Embed anonymous walks


Knowledge Graphs• Knowledge in graph form

– Capture entities, types, and relationships• Nodes are entities• Nodes are labeled with

their types• Edges between two

nodes capture relationships betweenentities


Example: Bibliographic networks• Node types: paper, title, author, conference,

year • Relation types: pubWhere, pubYear,

hasTitle, hasAuthor, cite


Example: Social networks

• Node types: account, song, post, food, channel

• Relation types: friend, like, cook, watch, listen


Example: Google Knowledge Graph

paintedBy


Knowledge Graphs in Practice

• Google Knowledge Graph • Amazon Product Graph• Facebook Graph API • IBM Watson • Microsoft Satori • Project Hanover/Literome• LinkedIn Knowledge Graph • Yandex Object Answer


Applications of Knowledge Graphs• Serving information


Applications of Knowledge Graphs• Question answering and conversation

agents


Knowledge Graph Datasets• Publicly available KGs:

– FreeBase, Wikidata, Dbpedia, YAGO, NELL

• Common characteristics:– Massive: millions of nodes and edges– Incomplete: many true edges are missing

Given a massive KG, enumerating all the

possible facts is intractable!

Can we predict plausible BUT missing

links?


Example: Freebase• Freebase

– ~50 million entities– ~38K relation types– ~3 billion facts/triples

• FB15k/FB15k-237– A complete subset of Freebase, used by

researchers to learn KG models

93.8% of persons from Freebase have no place of birth and 78.5% have no nationality!

[1]Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8.3 (2017): 489-508.[2]Min, Bonan, et al. "Distant supervision for relation extraction with an incomplete knowledge base." Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.


Key Task: KG Completion• Knowledge Graph completion is a link

prediction problem• KG incompleteness can substantially affect

the efficiency of systems relying on it

• Main paper: Translating Embeddings for Modeling Multi-relational Data. Bordes, Usunier, Garcia-Duran. NeurIPS 2013.


Key Task: KG Completion

missingrelation

• Intuition: a link prediction model that learns from local and global connectivity patterns in the KG, taking into account entities and relationships of different types at the same time

• Models: TransEand TransR


Translating Embeddings: TransE• Relationships between entities = triplets

– 𝒉 (head entity), 𝒍 (relation), 𝒕 (tail entity) => (ℎ, 𝑙, 𝑡)• Entities and relations are all embedded in

an entity space 𝑅o• Relations are represented as translations

– ℎ + 𝑙 ≈ 𝑡 if the given fact is true; else, ℎ + 𝑙 ≠ 𝑡


TransE• Translation Intuition:

– For a triple (ℎ, 𝑟, 𝑡), 𝐡, 𝐫, 𝐭 ∈ ℝv,𝐡 + 𝐫 = 𝐭

• Score function: 𝑓w ℎ, 𝑡 = ||ℎ + 𝑟 − 𝑡||

𝐡 𝐭

𝐫 ObamaNationality

American

NOTATION: embedding vectors will appear in boldface


Link Prediction in a KG using TransE• Who has won the Turing award?

• Who is a Canadian citizen?

Win

HintonBengio

Pearl

TuringAward

Canada

Trudeau Bieber

𝐪

Answers!

HintonBengio

Pearl

TuringAward

CanadaCitizen

Trudeau Bieber

Answers!

𝐪


TransE Optimization• Learn embeddings such that ℎ + 𝑙 = 𝑡 for

real triplets that exist in the knowledge graph, ℎ + 𝑙 ≠ 𝑡 for triplets that do not exist– Create a positive training set: of valid triples– Create a negative training set: by replacing

entities/relations from valid triples• Replacement is by random sampling

– Update embeddings till the distance for positive training set triples is minimized and distance for negative training set triples is maximized


TransE Training• Translation Intuition: for a triple (ℎ, 𝑙, 𝑡),

𝐡 + 𝒍 = 𝐭

• Max-margin loss:ℒ = ' 𝛾 + 𝑑(ℎ + 𝑙, 𝑡) − 𝑑(ℎ′ + 𝑙, 𝑡′)

�

(~,�,>)∈$,(~�,�,>�)∉$

where 𝛾 is the margin, i.e., the smallest distance tolerated by the model between a valid triple and a corrupted one.

Valid triple Negative triple


TransE Learning Algorithm

Entities and relations are initialized uniformly, and normalized

Negative sampling with triplet that does not appear in the KG

Comparative loss: favors lower distance values for valid triplets, high distance values for corrupted ones

Valid sample

Negative sample


Complex Relational Patterns• Symmetric Relations:

𝑟 ℎ, 𝑡 ⇒ 𝑟 𝑡, ℎ ∀ℎ, 𝑡– Example: Family, Roommate

• Composition Relations:𝑟- 𝑥, 𝑦 ∧ 𝑟. 𝑦, 𝑧 ⇒ 𝑟/ 𝑥, 𝑧 ∀𝑥, 𝑦, 𝑧

– Example: My mother’s husband is my father.• 1-to-N, N-to-1 relations:

𝑟 ℎ, 𝑡- , 𝑟 ℎ, 𝑡. , … , 𝑟(ℎ, 𝑡�) are all True.– Example: 𝑟 is “StudentsOf”


Composition in TransE• Composition Relations:

𝑟- 𝑥, 𝑦 ∧ 𝑟. 𝑦, 𝑧 ⇒ 𝑟/ 𝑥, 𝑧 ∀𝑥, 𝑦, 𝑧– Example: My mother’s husband is my father.

• In TransE, compositional relations are possible if r3 = r1 + r2

𝐱𝐫- 𝐫.𝐫/

𝐲𝐳


Symmetric Relations in TransE• Symmetric Relations: 𝑟 ℎ, 𝑡 ⇒ 𝑟 𝑡, ℎ ∀ℎ, 𝑡

– Example: Family, Roommate• In TransE, symmetric relations are not

possible:– For TransE to handle symmetric relations 𝑟, for all ℎ, 𝑡 that satisfy 𝑟(ℎ, 𝑡), 𝑟(𝑡, ℎ) is also True.

– So, ℎ + 𝑟 − 𝑡 = 0 and 𝑡 + 𝑟 − ℎ = 0.– Then 𝑟 = 0 and ℎ = 𝑡.– However ℎ and 𝑡 are two different entities and

should be mapped to different locations.


Limitation: N-ary Relations

• 1-to-N, N-to-1, N-to-N relations– Example: (ℎ, 𝑟, 𝑡-) and (ℎ, 𝑟, 𝑡.) both exist in

the knowledge graph, e.g., 𝑟 is “StudentsOf”• In TransE, 𝑡- and 𝑡. will map to the same

vector, although they are different entities.– 𝐭- = 𝐡 + 𝐫 = 𝐭.– 𝐭- ≠ 𝐭.

• In TransE, N-aryrelations are not possible

𝐡

𝐭-𝐭. 𝐫

𝐫

contradictory!


Solution: TransR• Learn embeddings for entities and relations

in separate spaces– Model entities as vectors in the entity space ℝv– Model a relation as vector 𝒓 in relation space ℝo

• Learn a relation-specific transformation from the entity-to-relation space per relation– Train 𝐌w ∈ ℝo×v as the projection matrix for

vector 𝒓• Reference: “Learning entity and relation

embeddings for knowledge graph completion.” Lin et al. AAAI 2015.


TransR Formulation

• ℎw = 𝑀wℎ, 𝑡w = 𝑀w𝑡• 𝑓w ℎ, 𝑡 = ℎw + 𝑟 − 𝑡w

– instead of 𝑓w ℎ, 𝑡 = ℎ + 𝑟 − 𝑡


Symmetric Relations in TransR• Symmetric Relations: 𝑟 ℎ, 𝑡 ⇒ 𝑟 𝑡, ℎ ∀ℎ, 𝑡

– Example: Family, Roommate• For TransR, we can learn Mr to map ℎ and 𝑡 to

the same location on the space of relation 𝑟𝑟 = 0, ℎw = 𝑀wℎ = 𝑀w𝑡 = 𝑡�ü

𝐡 𝐭w, ℎw

𝐭

𝑴w


N-ary Relations in TransR• 1-to-N, N-to-1, N-to-N relations

– Example: If (ℎ, 𝑟, 𝑡-) and (ℎ, 𝑟, 𝑡.) exist in the knowledge graph.

• We can learn 𝑀w so that 𝑡w = 𝑀w𝑡- = 𝑀w𝑡., even though 𝑡- does not need to be equal to 𝑡.!

𝐡𝐡w 𝐭w

𝐭-

𝐭.𝐫


Limitation: Composition in TransR• Composition Relations:

𝑟- 𝑥, 𝑦 ∧ 𝑟. 𝑦, 𝑧 ⇒ 𝑟/ 𝑥, 𝑧 ∀𝑥, 𝑦, 𝑧– Example: My mother’s husband is my father.

• Each relation has different space.• TransR is not naturally compositional for

multiple relations! û


Translation-Based KG EmbeddingEmbedding Entity Relation 𝒇𝒓(𝒉, 𝒕)TransE ℎ, 𝑡 ∈ ℝv 𝑟 ∈ ℝv ||ℎ + 𝑟 − 𝑡||TransR ℎ, 𝑡 ∈ ℝv 𝑟 ∈ ℝo,𝑀w

∈ ℝo×v||𝑀wℎ + 𝑟−𝑀w𝑡||

Embedding Symmetry Composition One-to-many

TransE û ü ûTransR ü û ü

Graph and Knowledge Graph Representation Learning · •Google Knowledge Graph •Amazon Product...

Documents

Transcript of Graph and Knowledge Graph Representation Learning · •Google Knowledge Graph •Amazon Product...