Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University...

Cross-Domain Bootstrapping for

Named Entity Recognition

Ang SunRalph Grishman

New York UniversityJuly 28, 2011

Beijing, EOS, SIGIR 2011 NYU

Outline1. Named Entity Recognition (NER)

2. Domain Adaptation Problem for NER

3. Cross-domain Bootstrapping

3.1Feature Generalization with Word Clusters

3.2Instance Selection Based on Multiple Criteria

4. Conclusion

NYU

1. Named Entity Recognition (NER)

Two missions

U.S. Defense Secretary Donald H. Rumsfeld discussed the resolution …

NYU

Identification

Classification

NAME NAME NAME

GPE ORG PERSON

2. Domain Adaptation Problem for NER NYU NER system performs well on in-domain

data (F-measure 83.08) But performs poorly on out-of-domain data (F-

measure 65.09)

NYU

Source domain(news articles)

George Bush Donald H. Rumsfeld

…Department of

Defense…

Target domain(reports on terrorism)

Abdul Sattar al-RishawiFahad bin Abdul Aziz bin Abdul Rahman Al-

Saud…

Al-Qaeda in Iraq…

2. Domain Adaptation Problem for NER

NYU

1. No annotated data from the target domain

2. Many words are out-of-vocabulary3. Naming conventions are different:

1. Length: short vs longsource: George Bush; Donald H. Rumsfeldtarget: Abdul Sattar al-Rishawi; Fahad bin Abdul Aziz

bin Abdul Rahman Al-Saud2. Capitalization: weaker in target

4. Name variation occurs often in targetShaikh, Shaykh, Sheikh, Sheik, …

We want to automatically adapt the source-domain tagger to the target

domainwithout annotating target domain data


1. Train a tagger from labeled source data2. Tag all unlabeled target data with current tagger3. Select good tagged words and add these to labeled data4. Re-train the tagger

Trained tagger

Unlabeled target

data

Instance Selection

Labeled Source data

President Assad

FeatureGeneralizatio

n

MultipleCriteria

NYU

3.1 Feature Generalization with Word Clusters The source model

Sequential model, assigning name classes to a sequence of tokens

One name type is split into two classes B_PER (beginning of PERSON) I_PER (continuation of PERSON)

Maximum Entropy Markov Model (McCallum et al., 2000)

Customary features

NYU


U.S. Defense Secretary

Donald H. Rumsfeld

B_GPE B_ORG O B_PER I_PER I_PER

3.1 Feature Generalization with Word Clusters The source/seed model

Customary features Extracted from context window (ti-2, ti-1, ti, ti+1,

ti+2)

NYU


U.S. Defense Secretary

Donald H. Rumsfeld

B_GPE B_ORG O B_PER I_PER I_PER

currentToken Donald

wordType_currentToken

initial_capitalized

previousToken_-1 Secretary

previousToken_-1_class O

previousToken_-2 Defense

nextToken_+1 H.

… …

3.1 Feature Generalization with Word Clusters• Build a word hierarchy from a 10M word

corpus (Source + Target), using the Brown word clustering algorithm

• Represent each word as a bit string

NYU

Bit string Examples

110100011 John, James, Mike, Steven

11010011101 Abdul, Mustafa, Abi, Abdel

11010011111 Shaikh, Shaykh, Sheikh, Sheik

111111110 Qaeda, Qaida, qaeda, QAEDA

00011110000 FBI, FDA, NYPD

000111100100 Taliban

3.1 Feature Generalization with Word Clusters• Add an additional layer of features that include word

clusters• currentToken = John• currentPrefix3 = 100 fires also for target words

To avoid commitment to a single cluster: cut word hierarchy at different levels

NYU

3.1 Feature Generalization with Word Clusters Performance on the target domain

Test set contains 23K tokens PERSON/ORGANIZATION/GPE

771/585/559 instances All other tokens belong to not-a-name

class 4 points improvement of F-measure

NYU

Model P R F1

Source_Model 70.02 61.86 65.69Source_Model

+ Word Clusters72.82 66.61 69.58

3.2 Instance Selection Based on Multiple Criteria Single-domain bootstrapping uses a confidence

measure as the single selection criterion

In a cross-domain setting, the most confidently labeled instances

are highly correlated with the source domain contain little information about the target

domain.

We propose multiple criteria Criterion 1: Novelty– prefer target-specific instances Promote Abdul instead of John

NYU

3.2 Instance Selection Based on Multiple Criteria Criterion 2: Confidence - prefer confidently labeled

instances

Local confidence: based on local features

NYU

1) minimum: 0. when one name class is predicted with probability 1, e.g., p(ci|v) = 1

2) maximum: when the predictions are evenly distributed over all the name classes.

3) The lower the value, the more confident the instance is.

( ) ( | ) log ( | )i

i ic

LocalConf I p c v p c v

I := instancev := feature vector for I

ci := name class i

3.2 Instance Selection Based on Multiple Criteria Criterion 2: Confidence

Global confidence: based on corpus statistics

NYU

1 Prime Minister Abdul Karim Kabariti PER2 warlord General Abdul Rashid Dostum PER3 President A.P.J. Abdul Kalam will PER4 President A.P.J. Abdul Kalam has PER5 Abdullah bin Abdul Aziz , PER6 at King Abdul Aziz University ORG7 Nawab Mohammed Abdul Ali , PER8 Dr Ali Abdul Aziz Al PER9 Nayef bin Abdul Aziz said PER10 leader General Abdul Rashid Dostum PER

P( Abdul is a PER) = 0.9

3.2 Instance Selection Based on Multiple Criteria Criterion 2: Confidence

Global confidence

Combined confidence: product of local and global confidence

NYU

( ) ( ) log ( )i

i ic

GlobalConf I p c p c

The lower the entropy, the more confident the instance is.

3.2 Instance Selection Based on Multiple Criteria Criterion 3: Density - prefer representative

instances which can be seen as centroid instances

NYU

1

( , )

( )1

N

j j i

Sim i j

Density iN

average similarity between i and all other instances j

Jaccard Similarity between the feature vectors of the two instances

the total number of instances in the corpus

3.2 Instance Selection Based on Multiple Criteria Criterion 4: Diversity - prefer a set of diverse

instances instead of similar instances “, said * in his”

Highly confident instance High density, representative instance BUT, continuing to promote such instance would not gain

additional benefit

NYU

( , ) ( ) ( )diff i j Density i Density j

diff(i, j) := difference between instances i and j Use a small value for diff(i, j) dense instances still have a higher chance to be selected while a certain degree of diversity is achieved at the same time.

3.2 Instance Selection Based on Multiple Criteria Putting all criteria together

1. Novelty: filter out source-dependent instances

2. Confidence: rank instances based on confidence and the top ranked instances will be used to generate a candidate set

3. Density: rank instances in the candidate set in descending order of density

4. Diversity: 1. accepts the first instance (with the highest density) in the

candidate set 2. and selects other candidates based on the diff measure.

NYU

3.2 Instance Selection Based on Multiple Criteria Results

NYU

+ Novelty + CombinedConf + Diversity+ Novelty + CombinedConf + Density + Novelty + CombinedConf + Novelty + LocalConf Generalized seed model (SourceModel + WordCluster)- Novelty + LocalConf +/- := with/without

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

Iteration

0 5 10 15 20 25 30 35

F1

68

69

70

71

72

73

74

4. Conclusion Proposed a general cross-domain bootstrapping algorithm for

adapting a model trained only on a source domain to a target domain

Improved the source model’s F score by around 7 points

This is achieved 1. without using any annotated data from the target domain 2. without explicitly encoding any target-domain-specific

knowledge into our system

The improvement is largely due to 1. the feature generalization of the source model with word

clusters 2. the multi-criteria-based instance selection method

NYU

Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University...

Documents

Transcript of Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University...