PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing...

23
PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern- Matching System Yan Li Beijing University of Posts and Telecommunications [email protected]

Transcript of PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing...

Page 1: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching

System

Yan Li

Beijing University of Posts and Telecommunications

[email protected]

Page 2: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Outline Introduction Preprocessing Entity Expansion Pattern bootstrapping Post-processing Evaluation results Conclusion

Page 3: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Introduction: the framework

Page 4: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Preprocessing

NLP (the Standford CoreNLP toolkit) POS tagger NER Date and time expression recognition Dependency parser Coreference resolution

Page 5: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Preprocessing (cont’) Example:

Takeshi Watanabe, the first president of the ADB, died in his native Japan.

The categorizations of slots

Page 6: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

PER ORG

Domain Slots Domain Slots

PERalternate_names; spouses; children; parents; siblings;

other_familyPER

alternate_names; members; shareholders; founded_by;

top_members/emplyees

ORG member_of; employee_of

ORG

parents; members; member_of; shareholders;

subsidiariesLOC country/state/city_of_birth/death/residence

DATE date_of_birth/deathLOC

member_of; country/state/city_of_headqu

arters; NUM age

ORI origin

REL religion DATE founded; dissolved

SCHOOL schools_attendedNUM

number_of_employees/membersCAUSE cause_of_death

TITLE titles URL website

CHARGE charges REL political/religious_affiliation

Page 7: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.
Page 8: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Entity Expansion The coreferences and alternate names of an

entity exist in relevant documents. In the purpose of improving recall. Scheme 1 (PER & ORG): coreference

resolution The relation chain run by the Stanford CoreNLP. Example:

Page 9: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Entity Expansion (cont’)

Scheme 2 (PER & ORG): identifying alternate names Rule-based information extraction Interpretative entities in parenthesis Example:

Starr International Co., known as SICO, ……

Scheme 3 (ORG) Removing the corporate suffixes in queries Finding the acronyms or full expressions Example:

Norwegian University of Science and Technology (NTNU)

Page 10: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.
Page 11: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Pattern Bootstrapping: Workflow

Ralph Grishman and Bonan Min, “New York University KBP 2010 Slot‐Filling System”, 2010.

Page 12: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Pattern Bootstrapping: Seed Pairs

The KBP English Monolingual Slot Filling Evaluation Data in the past three years 92 PER entities 106 ORG entities 1,627 entity-value pairs

Page 13: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Word sequence pattern the middle context between an entity-value pair Example:

PER:countries_of_residence <PER> native <LOC>

Dependency path pattern the shortest dependency path which connects an

entity-value pair Example:

PER:title <PER> appos <TITLE>

PER:member_of <PER> appos president prep_of<ORG>

PER:country_of_death <PER> nsubj-1 died prep_in<LOC>

Pattern Bootstrapping: Pattern Generation

Page 14: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Pattern Bootstrapping: Pattern Evaluation

In the purpose of improving precision Pattern frequency Trigger phrase High-confidence patterns

New entity-value pairs Iteration

Page 15: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.
Page 16: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Post-processing In the purpose of improving precision DATE

The SUTime module of the CoreNLP TIMEX2 normalization

PER: spouses, children and parents Last name complement Example: John Doe’s first wife, Ruth

“Ruth Doe” is better than “Ruth”.

Page 17: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Post-processing (cont’) Identifying countries, states/provinces

and cities for LOC slots A Wikipedia list containing all countries and

states or provinces. Adding modifiers into fillers of per: title

adjectival modifier: financial Minister noun compound modifier: police chief prepositional modifier: chief of military

operations

Page 18: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Evaluation Results PRIS

Summary StatisticsLDC Top-1 Top-2 Median

Precision 0.9278607 0.6757322 0.48955223 0.11392405

Recall 0.7252106 0.41866493 0.21257292 0.0874919

F1 0.8141142 0.5170068 0.2964302 0.0989736

Page 19: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Slot non-NIL correct redundant inexact wrong missing

Alternate names 6 0 0 0 23

Date of birth 16 4 0 1 1

Date of death 17 1 0 4 2

age 22 0 0 2 2

Country of birth 1 0 0 0 1

State or province of birth 8 0 2 3 2

City of birth 13 1 0 5 2

Country of death 1 0 0 2 0

State or province of death 13 0 2 1 2

City of death 17 0 0 4 1

Country of residence 10 2 2 7 3

State or province of residence 22 1 4 5 13

City of residence 35 1 0 14 8

origin 16 2 0 17 0

Cause of death 18 0 0 1 13

Schools attended 19 7 0 1 14

titles 85 13 8 24 4

Member of 26 2 4 17 10

Employee of 7 0 2 5 20

religion 4 0 0 1 3

spouses 16 5 1 3 10

Children 73 0 3 10 6

Parents 21 4 0 1 4

Siblings 20 0 1 8 3

Other family 2 0 0 0 7

Charges 5 0 0 4 2

Page 20: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Slot non-NIL correct redundant inexact wrong missing

Alternate names 46 4 5 25 5

Political/religious affiliations

7 1 0 6 3

Top members/employees 59 1 2 20 8

Number of employees/members

3 0 0 0 8

Members 0 0 0 0 4

Member of 0 0 0 0 7

Subsidiaries 7 0 0 3 10

Parents 4 1 0 4 4

Founded by 5 0 0 3 5

Founded 5 0 0 1 3

Dissolved 1 0 0 0 2

Country of headquarters 3 0 0 1 20

State or province of headquarters

1 1 0 7 11

City of headquarters 2 0 0 3 10

Shareholders 3 0 1 8 0

Website 7 0 0 1 8

Page 21: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Conclusion In the slot filling task of KBP 2012, we

designed an enhanced pattern-matching system which consists of preprocessing, entity expansion, pattern bootstrapping and post-processing.

The precision and recall are relatively good for some specific slots.

It is urgent to improve the remaining slots.

Page 22: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Tips Adequate preparation A harmonious team Active and disciplined environment Be passionate, patient and hardworking ……

Page 23: PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com.

Thank you!