Experiments in Graph-based Semi-Supervised Learning for ...Partha Pratim Talukdar * (Search Labs,...

Partha Pratim Talukdar* (Search Labs, MSR)

Fernando Pereira (Google)

Experiments in Graph-based Semi-Supervised Learning

for Class-Instance Acquisition

ACL, 2010 (Uppsala, Sweden) *Work done while at the Univ. of Pennsylvania

Class-Instance Acquisition

2


• Given an entity, assign human readable descriptors to it– e.g., Toyota is a car manufacturer, japanese

company, multinational company, ...

2




• Large scale, open domain

2




• Large scale, open domain

• Applications– Web search, Advertising, etc.

2

Graph-based Class-Instance Acquisition

3


Bob Dylan (0.95)Johnny Cash (0.87)

Billy Joel (0.82)

Set 2Set 1Billy Joel (0.75)

Johnny Cash (0.73)

PatternTable Mining

3



Billy Joel (0.82)


Johnny Cash (0.73)

PatternTable Mining

3

Musician Singer



Billy Joel (0.82)


Johnny Cash (0.73)

PatternTable Mining

Cluster ID

3

Musician Singer



Billy Joel (0.82)


Johnny Cash (0.73)

PatternTable Mining

Cluster ID

ExtractionConfidence

3

Musician Singer



Billy Joel (0.82)


Johnny Cash (0.73)

Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

PatternTable Mining

Cluster ID


3

Musician Singer



Billy Joel (0.82)


Johnny Cash (0.73)

Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

SeedClasses

PatternTable Mining

Cluster ID


3

Musician Singer



Billy Joel (0.82)


Johnny Cash (0.73)

Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

SeedClasses

PatternTable Mining

Cluster ID


3

Musician Singer

Can we infer that Bob Dylan is also a Musician, as that is missing in current

extractions?

Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008]

Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

4


InitializationSet 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

Musician 1.0

Musician 1.0

Musician 1.0

SeedLabels

4

PredictedLabels


Iteration 1Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

Musician 1.0

Musician 0.8Musician 1.0

Musician 1.0

SeedLabels

4

PredictedLabels


Iteration 2Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

Musician 1.0

Musician 0.8Musician 1.0

Musician 1.0

SeedLabels

Musician 0.6

4

PredictedLabels

• Graph-based Class-Instance Acquisition– Overview & previous work

• Review & comparison of three SSL algorithms– LP-ZGL (Zhu+, 2003)

– Adsorption (Baluja+, 2008)– Modified Adsorption (MAD) (Talukdar and Crammer, 2009)

• Additional Semantic Constraints

• Summary

Outline

5

Comments on the Constructed Graph

Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

6


Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0

Smoothness:Nodes connected by an edge should be assigned similar classes, as enforced by edge weight.

6


Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0


Initial Clusters

6


Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0


Initial Clusters

6

Coupling Node:Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement.


Set 2

Set 1

Bob Dylan

Johnny Cash

Billy Joel

0.95

0.87

0.82

0.73

0.75

Musician 1.0

Musician 1.0


Initial Clusters

Seed classes can be different from the cluster IDs

6

Coupling Node:Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement.

LP-ZGL[Zhu et al., ICML 2003]

• m labels

• W : edge weight matrix

• Yul: weight of label l on node u

• Yul: seed weight of label l on node u

• S: diagonal matrix, non-zero for seed nodes


• m labels






arg minY

m!

l=1

!

u,v

Wuv(Yul ! Yvl)2; s.t. SYl = SYl

• m labels






arg minY

m!

l=1

!

u,v


Smooth

• m labels






arg minY

m!

l=1

!

u,v


Match Seeds (hard)Smooth

Modified Adsorption (MAD)[Talukdar and Crammer, ECML 2009]

8


8

argminY

m+1�

l=1

��SY l − SY l�2 + µ1

�

u,v

Muv(Y ul − Y vl)2 + µ2�Y l −Rl�2

�

• m labels, +1 dummy label

• M = W� +W is the symmetrized weight matrix

• Y vl: weight of label l on node v

• Y vl: seed weight for label l on node v

• S: diagonal matrix, nonzero for seed nodes

• Rvl: regularization target for label l on node v


8

argminY

m+1�

l=1

��SY l − SY l�2 + µ1

�

u,v


�







Match Seeds (soft)


8

argminY

m+1�

l=1

��SY l − SY l�2 + µ1

�

u,v


�







Match Seeds (soft) Smooth


8

argminY

m+1�

l=1

��SY l − SY l�2 + µ1

�

u,v


�







Match Seeds (soft) SmoothMatch Priors(Regularizer)


8

argminY

m+1�

l=1

��SY l − SY l�2 + µ1

�

u,v


�







MAD has extra regularization compared

to LP-ZGL

Match Seeds (soft) SmoothMatch Priors(Regularizer)

MAD (contd.)

9

Inputs Y ,R : |V |× (|L|+ 1), W : |V |× |V |, S : |V |× |V | diagonalY ← YM = W +W�

Zv ← Svv + µ1�

u �=v Mvu + µ2 ∀v ∈ Vrepeat

for all v ∈ V doY v ← 1

Zv

�(SY )v + µ1Mv·Y + µ2Rv

�

end foruntil convergence

MAD (contd.)

9


Zv ← Svv + µ1�



Zv


�


• Easily Parallelizable: Scalable

MAD (contd.)

9


Zv ← Svv + µ1�



Zv


�


• Easily Parallelizable: Scalable• Importance of a node can be discounted

MAD (contd.)

9


Zv ← Svv + µ1�



Zv


�


• Easily Parallelizable: Scalable• Importance of a node can be discounted• Convergence guarantee under mild conditions

Experiments with Public Datasets

10


• Previous research used proprietary datasets– difficult to reproduce and extend

10



• Public datasets are used in the paper– Freebase: relational tables from multiple sources

– TextRunner (Ritter+, 2009): Open-domain IE System

– YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet

– Gold standard hypernyms: [Pantel+, 2009], WordNet

10



• Public datasets are used in the paper– Freebase: relational tables from multiple sources

– TextRunner (Ritter+, 2009): Open-domain IE System

– YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet

– Gold standard hypernyms: [Pantel+, 2009], WordNet

• Available at: http://www.talukdar.net/datasets/class_inst/10

http://www.talukdar.net/datasets/class_inst/


Graph Stats

11

Statistics of Graphs used in Experiments

Evaluation Metric: Mean Reciprocal Rank (MRR)

12

MRR =1

|test-set|�

v∈test-set

1

rankv(class(v))

Evaluation Metric: Mean Reciprocal Rank (MRR)

12

MRR =1

|test-set|�

v∈test-set

1

rankv(class(v))

Billy JoelLinguist 0.4Musician 0.3

...

Gold Label: MusicianMRR: 0.5 (1/2)

Graph-based SSL Comparisons (1)

13


13

0.15

0.2

0.25

0.3

0.35

170 x 2 170 x 10

TextRunner Graph, 170 WordNet Classes

Me

an

Re

cip

roca

l R

an

k (

MR

R)

Amount of Supervision

LP-ZGL Adsorption MAD

Graph with175k nodes,529k edges.


14

0.25

0.285

0.32

0.355

0.39

192 x 2 192 x 10

Freebase-2 Graph, 192 WordNet Classes

Me

an

Re

cip

roca

l R

an

k (

MR

R)

Amount of Supervision

LP-ZGL Adsorption MAD

Graph with303k nodes,2.3m edges.

When is MAD most effective?

15


15

0

0.1

0.2

0.3

0.4

0 7.5 15 22.5 30

Re

lati

ve I

ncr

eas

e i

n M

RR

by M

AD

ove

r L

P-Z

GL

Average Degree


15

0

0.1

0.2

0.3

0.4

0 7.5 15 22.5 30

Re

lati

ve I

ncr

eas

e i

n M

RR

by M

AD

ove

r L

P-Z

GL

Average Degree

MAD is most effective in dense graphs, where there is greater need for

regularization.

• Graph-based Class-Instance Acquisition– Overview & previous work

• Review & comparison of three SSL algorithms– LP-ZGL (Zhu+, 2003)

– Adsorption (Baluja+, 2008)– Modified Adsorption (MAD) (Talukdar and Crammer, 2009)

• Additional Semantic Constraints

• Summary

Outline

16

Can Additional Semantic Constraints Help ?

17


17

Set 2

Set 1

Isaac Newton

Johnny Cash

Bob Dylan


17

Instances with shared

attributes are likely to be

from the same class.

Set 2

Set 1

Isaac Newton

Johnny Cash

Bob Dylan

has_attribute-albums


17




Set 2

Set 1

Isaac Newton

Johnny Cash

Bob Dylan

has_attribute-albums


17




Set 2

Set 1

Isaac Newton

Johnny Cash

Bob Dylan

Graph-based representation makes it easy to incorporate such constraints!

Better Classes with YAGO Attributes

18


18 0.3

0.338

0.375

0.413

0.45

170 WordNet Classes, 10 Seeds per Class, using Adsorption

Me

an

Re

cip

roca

l R

an

k (

MR

R)

TextRunner GraphYAGO GraphTextRunner + YAGO Graph


18 0.3

0.338

0.375

0.413

0.45


Me

an

Re

cip

roca

l R

an

k (

MR

R)


Graph constructed from TextRunner (UWash) output, 175k nodes, 529k edges


18 0.3

0.338

0.375

0.413

0.45


Me

an

Re

cip

roca

l R

an

k (

MR

R)


Graph constructed from output of YAGO Knowledge Base, 142k nodes, 777k edges


18 0.3

0.338

0.375

0.413

0.45


Me

an

Re

cip

roca

l R

an

k (

MR

R)


Combined graph, with 237k nodes, 1.3m edges


18 0.3

0.338

0.375

0.413

0.45


Me

an

Re

cip

roca

l R

an

k (

MR

R)



18 0.3

0.338

0.375

0.413

0.45


Me

an

Re

cip

roca

l R

an

k (

MR

R)


Additional semantic constraints help improve performance significantly.

Classes for Attributes

• Classes assigned to attribute nodes

19

has_isbn

wordnet_bookwordnet_maganize

Classes for Attributes

• Classes assigned to attribute nodes

19

has_isbn

wordnet_bookwordnet_maganize

Evidence that attributes propagate right classes.

Summary

20

Summary

• Compared three graph-based SSL algorithms for class-instance acquisition– MAD is consistently most effective, particularly in

dense graphs

20

Summary


dense graphs

• Better class-instance acquisition with additional semantic constraints– easy to add such constraints in graph-based

representation

20

Summary


dense graphs

• Better class-instance acquisition with additional semantic constraints– easy to add such constraints in graph-based

representation

• Datasets available: www.talukdar.net/datasets/class_inst/

– for reproducibility and extension

20



Thank You!http://www.talukdar.net/datasets/class_inst/



Experiments in Graph-based Semi-Supervised Learning for ...Partha Pratim Talukdar * (Search Labs,...

Documents

Transcript of Experiments in Graph-based Semi-Supervised Learning for ...Partha Pratim Talukdar * (Search Labs,...