HW7 Extracting Arguments for % Ang Sun [email protected] March 25, 2012.

27
HW7 Extracting Arguments for % Ang Sun [email protected] March 25, 2012

Transcript of HW7 Extracting Arguments for % Ang Sun [email protected] March 25, 2012.

Page 1: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

HW7 Extracting Arguments for %

Ang [email protected] 25, 2012

Page 2: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

Outline

• File Format

• Training– Generating Training Examples– Extracting Features– Training of MaxEnt Models

• Decoding

• Scoring

Page 3: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

File Format

• Statistics Canada said service-industry <ARG1> output </ARG1> in August <SUPPORT> rose </SUPPORT> 0.4 <PRED class="PARTITIVE-QUANT"> % </PRED> from July .

Page 4: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Generating Training Examples– Positive Example• Only one positive example for a sentence• The one with the annotation ARG1

Page 5: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Generating Training Examples– Negative Examples

• Two methods!• Method 1: consider any token that has one of the following POSs

– NN 1150– NNS 905– NNP 205– JJ 25– PRP 24– CD 21– DT 16– NNPS 13– VBG 2– FW 1– IN 1– RB 1– VBZ 1– WDT 1– WP 1

Too many negative examples!

Page 6: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Generating Training Examples– Negative Examples• Two methods!• Method 2: only consider head tokens

Page 7: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: candToken=output

Page 8: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: tokenBeforeCand=service-industry

Page 9: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: tokenAfterCand=in

Page 10: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf:

tokensBetweenCandPRED=in_August_rose_0.4

Page 11: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: numberOfTokensBetween=4

Page 12: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: exisitVerbBetweenCandPred=true

Page 13: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: exisitSUPPORTBetweenCandPred=true

Page 14: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: candTokenPOS=NN

Page 15: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: posBeforeCand=NN

Page 16: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: posAfterCand=IN

Page 17: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: possBetweenCandPRED=IN_NNP_VBD_CD

Page 18: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: BIOChunkChain=

I-NP_B-PP_B-NP_B-VP_B-NP_I-NP

Page 19: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: chunkChain=

NP_PP_NP_VP_NP

Page 20: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: candPredInSameNP=False

Page 21: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: candPredInSameVP=False

Page 22: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: candPredInSamePP=False

Page 23: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Extracting Featuresf: shortestPathBetweenCandPred=

NP_NP-SBJ_S_VP_NP-EXT

Page 24: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

• Training of MaxEnt Model– Each training example is one line• candToken=output . . . . . class=Y• candToken=Canada . . . . . Class=N

– Put all examples in one file, the training file

– Use the MaxEnt wrapper or the program you wrote in HW5 to train your relation extraction model

Page 25: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

Decoding

• For each sentence– Generate testing examples as you did for training

• One example per feature line (without class=(Y/N))

– Apply your trained model to each of the testing examples

– Choose the example with the highest probability returned by your model as the ARG1

– So there should be and must be one ARG1 for each sentence

Page 26: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

Scoring

• As you are required to tag only one ARG1 for each sentence

• Your system will be evaluated based on accuracy– Accuracy = #correct_ARG1s / #sentences

Page 27: HW7 Extracting Arguments for % Ang Sun asun@cs.nyu.edu March 25, 2012.

Good Luck!