1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga...

26
1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic Representation Meeting University of Maryland

Transcript of 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga...

Page 1: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

1

Penn

English and Chinese PropBanks

Martha PalmerUniversity of Pennsylvania

with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder

April 14, 2005Semantic Representation Meeting

University of Maryland

Page 2: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

2

PennWhat is a PropBank? A PropBank is a corpus annotated with the

predicate-argument structure of the verbs:English Propbank: www.cis.upenn.edu/~ace 3/’04 LDC Kingsbury and Palmer 2002, Palmer, Gildea, Kingsbury, 2005

Wall Street Journal, 1M words, 120K+ predicate instances Brown, 14K predicate instances

Chinese Propbank: www.cis.upenn.edu/~chinese/cpb

Xue and Palmer 2003, Xue 2004

Xinhua (250K words – almost done), Sinorama (250K words – estimated 2007)

Nominalized verbs for English = NomBank/NYU Chinese NomBank?

Page 3: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

3

PennCapturing “neutral” semantic roles

Boyan broke [ Arg1 the LCD-projector.]break (agent(Boyan), patient(LCD-projector)) [Arg1 The windows] were broken by the

hurricane.

[Arg1 The vase] broke into pieces when it toppled over

Page 4: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

4

Penn

Frames File example: give< 4000 Frames for PropBankRoles: Arg0: giver Arg1: thing given Arg2: entity given to

Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

Page 5: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

5

Penn

Frames File example: givew/ Thematic Role LabelsRoles: Arg0: giver Arg1: thing given Arg2: entity given to

Example: double object The executives gave the chefs a standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation

VerbNet – based on Levin classes

Page 6: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

6

PennPropBank Exercise Ex.

[He]-Arg1 Theme [will]-MOD [probably]-MOD be

[extradited]-rel [to the U.S]-DIR [for trial under an extradition treaty President Virgilia Barco has revived]-PRP. 

He will probably be extradited to the U.S for trial under [an extradition treaty]-Arg1Theme [President Virgilia Barco]-Arg0Agent has [revived]-rel. 

Page 7: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

7

PennA Chinese Treebank Sentence

国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law

“The Congress passed the banking law recently.”

(IP (NP-SBJ (NN 国会 /Congress)) (VP (ADVP (ADV 最近 /recently)) (VP (VV 通过 /pass) (AS 了 /ASP) (NP-OBJ (NN 银行法 /banking law)))))

Page 8: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

8

PennThe Same Sentence, PropBanked

通过 (f2) (pass)

arg0 argM arg1

国会 最近 银行法 (law) (congress)

(IP (NP-SBJ arg0 (NN 国会 )) (VP argM (ADVP (ADV 最近 )) (VP f2 (VV 通过 ) (AS 了 ) arg1 (NP-OBJ (NN 银行

法 )))))

Page 9: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

9

PennAnnotation procedure

PTB II – Extract all sentences of a verb Create Frame File for that verb Paul Kingsbury

(3400+ lemmas, 4700 framesets,120K predicates) 1st pass: Automatic tagging Joseph Rosenzweig

2nd pass: Double blind hand correction by verb Inter-annotator agreement 84% (87% Arg#’s)

3rd pass: Adjudication Olga Babko-Malaya

4th pass: Train automatic semantic role labellers Dan Gildea, Sameer Pradhan, Nianwen Xue, Szuting Yi, ….

CoNLL-04 shared task, 2004, 2005, ….

Page 10: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

10

PennPropbank Kappa Statistics

P(A) P(E) Kappa

Role identify .99 .89 .93

Role classify .95 .27 .93

combined .99 .88 .91

Role identificationclassifying tree nodes as argument vs. non-argument

Role classificationclassifying arguments as Arg1 vs. Arg2 vs ArgM-LOC vs. etc…

Kappa = P(A) - P(E) / 1 - P(E)

Page 11: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

11

PennThroughput

Framing: approximately 80-100 verbs/week Annotation: approximately 70 instances/hour Solomonization: approximately 100

instances per hour 100K words (last summer)

~4 months (hardly any new frame files)4-6 part-time annotators (100hrs a week), half-time programmer, half-time project manager,half-time adjudicator, frame file creator

Page 12: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

12

PennApplications

IE – slot filling Question Answering:

What do lobsters like to eat?Answer is NOT people!

Machine TranslationReconciling event descriptions across

languages - See Parallel Prop II

Page 13: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

13

PennWord Senses in PropBank Orders to ignore word sense not feasible for 700+

verbs Mary left the room Mary left her daughter-in-law her pearls in her will

Frameset leave.01 "move away from":Arg0: entity leavingArg1: place left

Frameset leave.02 "give":Arg0: giver Arg1: thing givenArg2: beneficiary

How do these relate to traditional word senses in WordNet?

Page 14: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

14

PennOverlap between Senseval2Groups and Framesets – 95%

WN1 WN2 WN3 WN4

WN6 WN7 WN8 WN5 WN 9 WN10

WN11 WN12 WN13 WN 14

WN19 WN20

Frameset1

Frameset2

develop

Page 15: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

15

PennSense Hierarchy (Palmer, et al, SNLU04 - NAACL04)

PropBank Framesets – ITA >90% coarse grained distinctions

20 Senseval2 verbs w/ > 1 FramesetMaxent WSD system, 73.5% baseline, 90% accuracy

Sense Groups (Senseval-2) - ITA 82% Intermediate level (includes Levin classes) – 69%

WordNet – ITA 71% fine grained distinctions, 60.2%

Tagging w/groups, ITA 89%, 200@hr

Page 16: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

16

PennPropBank II – English/Chinese (100K)

We still need relations between events and entities: Event ID’s with event coreference Selective sense tagging

Tagging nominalizations w/ WordNet senseGrouped WN senses - selected verbs and nouns

Nominal Coreference not names

Clausal Discourse connectives – selected subset

Level of representation that reconciles many surface differences between the languages

Page 17: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

17

PennEvent IDs – Parallel Prop II (1)

Aspectual verbs do not receive event IDs:

今年 /this year 中国 /China 继续 /continue 发挥 /play 其 /it 在 /at 支持 /support 外商 /foreign business 投资 /investment 企业 /enterprise 方面 /aspect 的 /DE 主 /main 渠道 /channel 作用 /role

“This year, the Bank of China will continue to play the main role in supporting foreign-invested businesses.”

Page 18: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

18

PennEvent IDs – Parallel Prop II (2) Nominalized verbs do:

He will probably be extradited to the US for trial. done as part of sense-tagging (all 7 WN senses for “trial” are events.)

随着 /with 中国 /China 经济 /economy 的 /DE 不断 /continued 发展 /development…

“With the continued development of China’s

economy…”

The same events may be described by verbs in English and nouns in Chinese, or vice versa. Event IDs help to abstract away from POS tag

Page 19: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

19

PennEvent reference – Parallel Prop II Pronouns (overt or covert) that refer to events:

[This] is gonna be a word of mouth kind of thing.

这些 /these 成果 /achivements 被 /BEI 企业 /enterprise 用 /apply (e15) 到 /to 生产 /production 上 /on 点石成金 /spin gold from straw , *pro*-e15 大大 /greatly 提高 /improve 了 /le 中国 /China 镍 /nickel 工业 /industry 的 /DE 生产 /production 水平 /level 。

“These achievements have been applied (e15) to production by enterprises to spin gold from straw, which-e15 greatly improved the production level of China’s nickel industry.”

Prerequisites:pronoun classification free trace annotation

Page 20: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

20

PennChinese PB II: Sense tagging Much lower polysemy than English

Avg of 3.5 (Chinese) vs. 16.7 (English) Dang, Chia, Chiou, Palmer, COLING-02

More than 2 Framesets 62/4865 (250K) Ch vs. 294/3635 (1M) English

Mapping Grouped English senses to Chinese (English tagging - 93 verbs/168 nouns, 5000+ instances)

Selected 12 polysemous English words (7 verbs/5 nouns) For 9 (6 verbs/3 nouns), grouped English senses map to unique

Chinese translation sets (synonyms)

Page 21: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

21

PennMapping of Grouped Sense Tagsto Chinese

increase 提高 / ti2gao1

lift, elevate,

orient upwards 仰 / yang3

Collect, levy募集 / mu4ji2筹措 / chou2cuo4筹 ... / chou2…

invoke, elicit, set off 提 / ti4

raise – translations by group

Page 22: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

22

Penn

Discourse connectives: The Penn Discourse TreeBank

WSJ corpus (~1M words, ~2400 texts) http://www.cis.upenn.edu/~pdtb Miltsakaki, Prasad, Joshi and Webber, LREC-04, NAACL-04 Frontiers Prasad, Miltsakaki, Joshi and Webber ACL-04 Discourse Annotation

Chinese: 10 explicit discourse connectives that include subordination conjunctions, coordinate conjunctions, and discourse adverbials.

Argument determination, sense disambiguation

[arg1 学校 /school 不 /not 教 /teach 理财 /finance management] , [conn 结果 /as a result] [arg2 报章 /newspaper 上 /on 的 /DE 各 /all 种 /kind 专栏 /column 就 /then 成为 /become 信息 /information 的 /DE 主要 /main 来源 /source] 。

“The school does not teach finance management. As a result, the different kinds of columns become the main source of information.”

Page 23: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

23

PennSummary of English PropBanksOlga Babko-Malaya, Ben Snyder

Genre Words Frames

Files

Frameset

Tags

Released Prop2

Wall Street Journal*

(Penn TreeBank II)

1000K < 4000 700+ March, 04

English Translation of

Chinese TreeBank *

100K <1500 Dec, 04 Aug, 05

Xinhua News

DOD funding

250K < 6000 200 Dec, 04 Dec, 05

(100K)

Sinorama

NSF-ITR funding

150K < 4000 July, 05

Sinorama, English corpus

NSF-ITR funding

250K <2000 Dec, 06

*DOD funding

Page 24: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

24

PennAnnotation of free traces Free traces – traces which are not linked to an antecedent

in PropBank

ArbitraryLegislation to lift the debt ceiling is ensnarled in the fight over

[*]–ARB cutting capital-gains taxes Event

The department proposed requiring (e4) stronger roofs for light trucks and minivans , [*]-e4 beginning with 1992 models

ImperativeAll right, [*]-IMP shoot.

1K instances of free traces in a 100K corpus

Page 25: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

25

PennClassification of pronouns

'referring' [John Smith] arrived yesterday. [He] said that...

‘bound' [Many companies] raised [their] payouts by more than 10%

‘event‘ [This] is gonna be a word of mouth kind of thing.

‘generic' I like [books]. [They] make me smile.

Page 26: 1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

26

PennMapping of Grouped Sense Tagsto Chinese

Zhejiang| 浙江 zhe4jiang1 will| 将 jiang1 raise| 提高ti2gao1 the level| 水平 shui3ping2 of| 的 de opening up| 开放 kai1fang4 to| 对 dui4 the outside world| 外 wai4. (浙江将提高对外开放的水平。)

I| 我 wo3 raised| 仰 yang3 my| 我的 wo3de head|头 tou2 in expectation| 期望 qi1wang4. (我仰头望去。)

…, raising| 筹措 chou2cuo4 funds| 资金 zi1jin1 of|的 de 15 billion|150 亿 yi1ban3wu3shi2yi4 yuan|元 yuan2 (… 筹措资金 150 亿元。 )

The meeting| 会议 hui4yi4 passed| 通过 tong1guo4 the “decision regarding motions”| 议案 yi4an4 raised| 提 ti4 by 32 NPC| 人大 ren2da4 representatives| 代表 dai4biao3 (会议通过了 32 名人大代表所提的议案。)