Project Presentation

15
1 111/07/20 Project Project Presentation Presentation Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang Lin572 Advanced Statistic Lin572 Advanced Statistic Methods in NLP Methods in NLP

description

Lin572 Advanced Statistic Methods in NLP. Project Presentation. Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang. PART 3. MaxEnt (yipee!). The Good Stuff:. Simple feature templates and extraction Elegant data structures for storage and easy access Pretty good results!. - PowerPoint PPT Presentation

Transcript of Project Presentation

Page 1: Project Presentation

1112/04/19

Project PresentationProject Presentation

Team Members:Anna Tinnemore

Gabriel NeerYow-Ren Chiang

Lin572 Advanced Statistic Methods Lin572 Advanced Statistic Methods in NLPin NLP

Page 2: Project Presentation

2112/04/19

PART 3PART 3

MaxEnt

(yipee!)

Page 3: Project Presentation

112/04/19 3

The Good Stuff:

• Simple feature templates and extraction

• Elegant data structures for storage and easy access

• Pretty good results!

Page 4: Project Presentation

112/04/19 4

The Bad Stuff:

• Hmmm. . . .

Page 5: Project Presentation

112/04/19 5

Features

• A few short loops collected the most relevant context features

• No long-winded feature templates

• Easy-access hashes

Page 6: Project Presentation

112/04/19 6

Decent Results

• Mid-nineties increasing with the size of the training data

• Result1K 5K 10K 40K

Accuracy

88.31% 93.55% 94.63% 96.34%

Training Time

24 sec. 2 min 27 sec.

4min 28 sec.

18min 34 sec.

Page 7: Project Presentation

7112/04/19

PART 4PART 4

Task 2Bagging

Page 8: Project Presentation

112/04/19 8

Tie Function

use Tie::File; use Fcntl; for my $bag_num (1 .. $B) {

# The Nth bag from file foo.txt becomes foo.txtbagN, etc.

my $bag_name = "$file_name-bag$bag_num"; open (BAG, ">$bag_name")

or die "Can't open $bag_name for writing: $!"; for (@lines) {

# Pick random line of file. my $line = $lines[ rand @lines ]; print BAG "$line\n"; # Output to the bag. } }

Page 9: Project Presentation

112/04/19 9

Combination

• VOTING!!

Page 10: Project Presentation

112/04/19 10

Step 1:

# Loop through file and remember words. Keep them grouped by sentence. while (<FILE>) {

foreach (@word_tags) { my @wordtag = split /\//; push (@words, ($wordtag[0]));

} push (@sentences, (\@words));

}

Page 11: Project Presentation

112/04/19 11

Step 2: # Go through file and for each word, increase the count

of its tagfor (@ARGV) {

my $tag_index = 0; while (<FILE>) {

foreach (@word_tags) { my @wordtag = split /\//; my $tag = $wordtag[1];

$tags[$tag_index]->{$tag}++; $tag_index++;

} }

}

Page 12: Project Presentation

112/04/19 12

Step 3: # Go through the sentences and print out each

word/tag pair. my $tag_index = 0; foreach my $sent (@sentences) {

foreach my $word (@$sent) { my $tag = max_tag($tags[$tag_index]); $tag_index++; print "$word/$tag ";

} print "\n"; }

Page 13: Project Presentation

112/04/19 13

Finding the “Best Tag” # Find the tag with the highest count.

sub max_tag { my $tag_hash = shift; (my $tag) = keys %$tag_hash; my $tag_count = $tag_hash->{$tag};

foreach (keys %$tag_hash) { if ($tag_hash->{$_} > $tag_count) { $tag = $_; $tag_count = $tag_hash->{$tag} }

} return $tag; }

Page 14: Project Presentation

112/04/19 14

Procedure1. Creating Bootstrap samples

• Treating the file as an array for lines. • N random array indices are selected and each

corresponding line is output to a file

2. Combine_tool.pl • opens the file corresponding to its first argument • reads in all words, aggregated by sentence

3. An array of tag hashes is created. • For each file in its arg list, opens that file and reads the

tags sequentially• The hash item corresponding to the tag in the

appropriate index of the tag area is incremented• For each index, the hash label with the highest count is

chosen as the correct tag

4. Re-associate the tags with their words 5. Print out the word/tag pairs

Page 15: Project Presentation

112/04/19 15

Result

Training Data

Method 1K 5K 10K

Trigram 85.68 / 83.35 / 85.46 92.12 / 90.85 / 91.90 93.44 / 92.78 / 93.32

TBL 90.64 / 90.27 / 91.70 93.97 /93.75 / 94.91 94.91 / 94.75 / 95.60

MaxEnt 88.31 / 88.23 / 89.85 93.55 / 92.87 / 94.05 94.63 / 94.25 / 95.10

Comb 91.39 / 90.62 / 92.45 94.87 / 94.26 / 95.21 95.61 / 95.17 / 95.55