Project Presentation
-
Upload
nicodemus-illias -
Category
Documents
-
view
11 -
download
0
description
Transcript of Project Presentation
1112/04/19
Project PresentationProject Presentation
Team Members:Anna Tinnemore
Gabriel NeerYow-Ren Chiang
Lin572 Advanced Statistic Methods Lin572 Advanced Statistic Methods in NLPin NLP
2112/04/19
PART 3PART 3
MaxEnt
(yipee!)
112/04/19 3
The Good Stuff:
• Simple feature templates and extraction
• Elegant data structures for storage and easy access
• Pretty good results!
112/04/19 4
The Bad Stuff:
• Hmmm. . . .
112/04/19 5
Features
• A few short loops collected the most relevant context features
• No long-winded feature templates
• Easy-access hashes
112/04/19 6
Decent Results
• Mid-nineties increasing with the size of the training data
• Result1K 5K 10K 40K
Accuracy
88.31% 93.55% 94.63% 96.34%
Training Time
24 sec. 2 min 27 sec.
4min 28 sec.
18min 34 sec.
7112/04/19
PART 4PART 4
Task 2Bagging
112/04/19 8
Tie Function
use Tie::File; use Fcntl; for my $bag_num (1 .. $B) {
# The Nth bag from file foo.txt becomes foo.txtbagN, etc.
my $bag_name = "$file_name-bag$bag_num"; open (BAG, ">$bag_name")
or die "Can't open $bag_name for writing: $!"; for (@lines) {
# Pick random line of file. my $line = $lines[ rand @lines ]; print BAG "$line\n"; # Output to the bag. } }
112/04/19 9
Combination
• VOTING!!
112/04/19 10
Step 1:
# Loop through file and remember words. Keep them grouped by sentence. while (<FILE>) {
foreach (@word_tags) { my @wordtag = split /\//; push (@words, ($wordtag[0]));
} push (@sentences, (\@words));
}
112/04/19 11
Step 2: # Go through file and for each word, increase the count
of its tagfor (@ARGV) {
my $tag_index = 0; while (<FILE>) {
foreach (@word_tags) { my @wordtag = split /\//; my $tag = $wordtag[1];
$tags[$tag_index]->{$tag}++; $tag_index++;
} }
}
112/04/19 12
Step 3: # Go through the sentences and print out each
word/tag pair. my $tag_index = 0; foreach my $sent (@sentences) {
foreach my $word (@$sent) { my $tag = max_tag($tags[$tag_index]); $tag_index++; print "$word/$tag ";
} print "\n"; }
112/04/19 13
Finding the “Best Tag” # Find the tag with the highest count.
sub max_tag { my $tag_hash = shift; (my $tag) = keys %$tag_hash; my $tag_count = $tag_hash->{$tag};
foreach (keys %$tag_hash) { if ($tag_hash->{$_} > $tag_count) { $tag = $_; $tag_count = $tag_hash->{$tag} }
} return $tag; }
112/04/19 14
Procedure1. Creating Bootstrap samples
• Treating the file as an array for lines. • N random array indices are selected and each
corresponding line is output to a file
2. Combine_tool.pl • opens the file corresponding to its first argument • reads in all words, aggregated by sentence
3. An array of tag hashes is created. • For each file in its arg list, opens that file and reads the
tags sequentially• The hash item corresponding to the tag in the
appropriate index of the tag area is incremented• For each index, the hash label with the highest count is
chosen as the correct tag
4. Re-associate the tags with their words 5. Print out the word/tag pairs
112/04/19 15
Result
Training Data
Method 1K 5K 10K
Trigram 85.68 / 83.35 / 85.46 92.12 / 90.85 / 91.90 93.44 / 92.78 / 93.32
TBL 90.64 / 90.27 / 91.70 93.97 /93.75 / 94.91 94.91 / 94.75 / 95.60
MaxEnt 88.31 / 88.23 / 89.85 93.55 / 92.87 / 94.05 94.63 / 94.25 / 95.10
Comb 91.39 / 90.62 / 92.45 94.87 / 94.26 / 95.21 95.61 / 95.17 / 95.55