Recent Advances in Distantly Supervised Relation Extractionwilliam/papers/Part1_Distant... ·...

Post on 10-Mar-2020

0 views 0 download

Transcript of Recent Advances in Distantly Supervised Relation Extractionwilliam/papers/Part1_Distant... ·...

Recent Advances in Distantly Supervised Relation Extraction

William WangDepartment of Computer Science

University of California, Santa Barbara

Joint work with Jiawei Wu, Lei Li, Pengda Qin, Weiran Xu.

CIPS Summer School 2018

Beijing, China

1/20

Agenda

• Motivation

• Challenges in Semi-Supervised Learning • Reinforced Co-Training (Wu et al., NAACL 18)• Reinforced Distant Supervision Relation Extraction

(Qin et al., ACL 18a)

• DSGAN (Qin et al., ACL 18b)• Conclusions

2

Motivation

3

• Most of the existing successful stories of deep learning are still based on supervised learning.

• For example, object recognition, machine translation, text classification.

• However, in many applications, it is not realistic to obtain large amount of labeled data.

• We need to leverage unlabeled data.

A Classic Example of Semi-Supervised Learning

• Co-Training (Blum and Mitchell, 1998)

4

Challenges• The two classifiers in co-training have to be

independent.• Choosing highly-confident self-labeled examples

could be suboptimal.• Sampling bias shift is common.

5

Our Approach: Reinforced SSL

• Assumption: not all the unlabeled data are useful.

• Idea: performance-driven semi-supervised learning that learns an unlabeled data selection policy with RL, instead of using random sampling.

• 1. Partition the unlabeled data space• 2. Train a RL agent to select useful unlabeled data• 3. Reward: change in accuracy on the validation set

6

Reinforcement Learning

7

Environment

𝑎𝑡𝑠$%&

𝑟$%&

𝑠$𝑟$

Agent

Agent Environment

DeepQ-Network UnlabeledData

Reinforced Co-Training(Wu et al., NAACL 2018)

8

Deep Q-Learning

9

Experiment 1: Clickbait Detection

10

Experiment 1: Clickbait Detection

11

Experiment 2: Generic Text Classification

12

Experiment 2: Generic Text Classification

13

Conclusion

• We proposed a novel RL framework for semi-supervised learning• Strong results in SSL text classification• Also showed effectiveness in relation extraction

14

Deep Reinforcement Learning for Distantly Supervised Relation Extraction

15

Outline

• Motivation• Algorithm• Experiments• Conclusion

16

Outline

• Motivation• Algorithm• Experiments• Conclusion

17

Plain Text Corpus(Unstructured Info)

Classifier Entity-relation Triple(Structured Info)

Relation Type withLabeled Dataset

Relation Type withoutLabeled Dataset

18

Relation Extraction

“If two entities participate in a relation, any sentencethat contains those two entities might express thatrelation.” (Mintz, 2009)

19

Distant Supervision

Data(x): <Belgium, Nijlen>Label(y): /location/contains

Target Corpus(Unlabeled)

DS Label(y): /location/contains

Nijlen is a municipalitylocated in the Belgianprovince of Antwerp.

20

Distant Supervision

DS Data(x):

v Within-Sentence-Bag Level

v Entity-Pair Level

§ Hoffmann et al., ACL 2011.§ Surdean et al., ACL 2012.§ Zeng et al., ACL 2015. § Li et al., ACL 2016.

§ None

21

Wrong Labeling

§ Place_of_Death

i. Some New York city mayors – William O’Dwyer, Vincent R. Impellitteri and Abraham Beame – were born abroad.

ii. Plenty of local officials have, too, including two New York city mayors, James J. Walker, in 1932, and William O’Dwyer, in 1950.

22

Wrong Labeling

v Entity-Pair Level

v Most of entity pairs only have several sentences

1 Sentence55%2 Sentence

32%

Other4%

23

Wrong Labeling

v Lots of entity pairs have repetitive sentences

Outline

• Motivation• Algorithm• Experiments• Conclusion

24

Sentence-Level Indicator

Entity-Pair Level Wrong Labeling Problem

Learn a Policy to Denoise theTraining Data

25

General Purpose and Offline Process

Without Supervised Information

Requirements

Negative set

Positive set

Negative set

Positive setFalse Positive Policy Based Agent

𝑅𝑒𝑤𝑎𝑟𝑑

𝐴𝑐𝑡𝑖𝑜𝑛

Classifier𝑇𝑟𝑎𝑖𝑛

False Positive

DS Dataset Cleaned Dataset

26

Overview

v State

27

v Action

v Reward§ ???

Deep Reinforcement Learning

§ Sentence vector§ The average vector of previous removed sentences

§ Remove & retain

v One relation type has an agent

28

v Sentence-level

v Split into training set and validation set

Deep Reinforcement Learning

§ Positive: Distantly-supervised positive sentences§ Negative: Randomly sampled

RL Agent Train

29

Deep Reinforcement Learning

TrainRelationClassifier

RelationClassifier

𝐹&34&

𝐹&3× +𝓡3 + ×(−𝓡3)

Noisydataset𝑃$=>3

Cleaneddataset

Cleaneddataset

Removedpart

Removedpart

𝓡3 = 𝛼(𝐹&3 - 𝐹&34& )

RL Agent

Epoch i-1

EpochiNoisy

dataset𝑃$=>3

+𝑁$=>3

𝑁$=>3 +

30

Positive Set Negative Set

§ Accurate

§ Steady

§ Fast

Reward

False Positive

§ Obvious

31

Positive Set Negative Set

RelationClassifier

PreTrain RelationClassifier

Train

Calculate

𝐹&

Epoch 𝑖

Reward

False Positive

False Positive

Positive Negative

Outline

• Motivation• Algorithm• Experiments• Conclusion

32

v Dataset: SemEval-2010 Task 8v True Positive: Cause-Effectv False Positive: Other

Evaluation on a Synthetic NoiseDataset

v True Positive + False Positive: 1331 samples

33

0.64

0.645

0.65

0.655

0.66

0.665

0.67

0.675

0.68

0.685

0 10 20 30 40 50 60 70 80 90 100

F1Score

Epoch

200 FPs in 1331 Samples

Evaluation on a Synthetic NoiseDataset

(198/388)

(197/339)

(195/308)(180/279) (179/260)

34

False PositiveRemoved Part

Evaluation on a Synthetic NoiseDataset

0.68

0.69

0.7

0.71

0.72

0.73

0.74

0.75

0 10 20 30 40 50 60 70 80 90 100

F1Score

Epoch

0 FPs in 1331 samples

(0/258)

(0/150)

(0/121)

(0/59)(0/32)

35

36

v CNN+ONE, PCNN+ONE§ Distant supervision for relation extraction via piecewise

convolutional neural networks. (Zeng et al., 2016)

v CNN+ATT, PCNN+ATT§ Neural relation extraction with selective attention over

instances. (Lin et al., 2016)

Distant Supervisionon NYT Freebase Dataset

37

Distant Supervision

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

CNN-basedCNN+ONE

CNN+ONE_RL

CNN+ATT

CNN+ATT_RL

38

Distant Supervision

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

PCNN-based

PCNN+ONE

PCNN+ONE_RL

PCNN+ATT

PCNN+ATT_RL

Outline

• Motivation• Algorithm• Experiments• Conclusion

39

vWe propose a deep reinforcement learningmethod for robust distant supervision relationExtraction.

v Our method is model-agnostic.

v Our method boost the performance of recentlyproposed neural relation extractors.

Conclusion

40

DSGAN: Adversarial Learning for DenoisingDistantly Supervised Relation Extraction

(Qin et al., ACL 2018b)

41

42

DS data space

DS Positive Data

DS Negative Data

Distant Supervision Data Distribution

43

DS data space

DS False PositiveData

DS True Positive Data

DS Negative Data

The Decision Boundaryof DS Data

The DesiredDecision Boundary

Data Distribution

44

Adversarial Training

True Positive

False Positive

Noisy Positive Set

Generator

True Positive

False Positive

Label1

Label1

Label0

Label1

45

DSGAN (Qin et al., ACL 2018b)

Epoch 𝐢

𝐁𝐚𝐠𝐤4𝟏

𝐁𝐚𝐠𝐤

𝐁𝐚𝐠𝐤%𝟏

DSPositiveDataset

𝑠&𝑠I𝑠J

𝑠K

G

𝑝& = 0.57

𝑝I = 0.02

𝑝J = 0.83

𝑝T = 0.26

𝑝V = 0.90

Sampling

𝐥𝐚𝐛𝐞𝐥 = 𝟏

𝐥𝐚𝐛𝐞𝐥 = 𝟎D

𝒓𝒆𝒘𝒂𝒓𝒅

DS positivedataset

𝐥𝐚𝐛𝐞𝐥 = 𝟏

𝐥𝐚𝐛𝐞𝐥 = 𝟎

Pre-training

𝑝K= 0.7

High-confidencesamples

Low-confidencesamples

Generator Discriminator

DS negativedataset

DS negativedataset

v Sentence-Level Noise Reduction

46

v Training Without Supervised Information

v Model-Agnostic

Characteristics

47

Distant Supervision Relation Extraction

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4

CNN-basedCNN+ONECNN+ONE_GANsCNN+ATTCNN+ATT_GANs

48

Distant Supervision Relation Extraction

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4

PCNN-basedPCNN+ONEPCNN+ONE_GANsPCNN+ATTPCNN+ATT_GANs

Conclusion

• We introduce Reinforced Co-Training, a new approach that combines reinforcement learning and semi-supervised learning.• We show that in weakly-supervised relation

extraction, reinforcement learning can be utilized to de-noise the training signals.• Adversarial learning serves as a joint learning

framework, and it can also be applied to de-noising distantly supervised IE data.

49

50

Thanks!

http://nlp.cs.ucsb.edu