Natural Language Processing with Per

14

Click here to load reader

Transcript of Natural Language Processing with Per

Page 1: Natural Language Processing with Per

FossConf 2008 Chennai

Natural Language Processing with Perl

G Jaganadh C-DAC Thiruvananthapuram

Page 2: Natural Language Processing with Per

FossConf 2008 Chennai

Talk Overview

Introduction

Natural Language Processing

Perl

Perl Lingua Modules

Some examples

Towards future

Page 3: Natural Language Processing with Per

FossConf 2008 Chennai

Introduction

•Objectives of the talk

Introducing NLP techniques for Language Researchers

Page 4: Natural Language Processing with Per

FossConf 2008 Chennai

Natural Language Processing

Introduction to NLP

Sub fields in NLP

Page 5: Natural Language Processing with Per

FossConf 2008 Chennai

Perl

•Practical Extraction and Report Language

Free and Open Source

Easy to Learn

Powerful regular Expressions for text searching

Page 6: Natural Language Processing with Per

FossConf 2008 Chennai

Perl Lingua Modules

Perl Modules for Linguistic Processing

All most all modules are for English Dutch and other

European Languages

Powerful implementation of different NLP algorithms

Page 7: Natural Language Processing with Per

FossConf 2008 Chennai

Some Examples

Counting words in a text

Pattern Matching

Use of Lingua::EN::Sentence

Use of Lingua::EN::NamedEntity

Page 8: Natural Language Processing with Per

FossConf 2008 Chennai

Counting words $text = <>;while ($line = <>) { $text .= $line;}#$text =~ tr/a-z��������A-Z���������\n/cs;@words = split(/\n/, $text);for ($i = 0; $i <= $#words; $i++) {

if (!exists($frequency{$words[$i]})) {$frequency{$words[$i]} = 1;

} else {$frequency{$words[$i]}++;

}}foreach $word (sort keys %frequency){

print "$frequency{$word} $word\n";}

Page 9: Natural Language Processing with Per

FossConf 2008 Chennai

Lingua::EN::Sentence

#!/usr/local/bin/perl -wuse Lingua::EN::Sentence qw( get_sentences add_acronyms );## adding support for abbreviationsadd_acronyms('lt','gen');$/ = "\n\n";

while(<>) { $sentences=get_sentences($_); foreach $s (@$sentences) {

print "<s> $s </s>\n"; }}

Page 10: Natural Language Processing with Per

FossConf 2008 Chennai

Lingua::EN::NamedEntity

#!/usr/bin/perluse strict;use Lingua::EN::NamedEntity;while (<>) {my $str = join '\n',<>;#my $str = join '\n',<INP>;my @entities = extract_entities($str);foreach my $entity (@entities) {

print $entity->{entity},"\n";}

}

Page 11: Natural Language Processing with Per

FossConf 2008 Chennai

Pattern Matching

while ($line = <>) {

if ($line =~ m/_____/ ) {

print $line ;

}

}

Page 12: Natural Language Processing with Per

FossConf 2008 Chennai

Toward future

Lingua Modules for Indian Languages

Useful Stuff•http://search.cpan.org/search?query=Lingua&mode=all http://wiki.christophchamp.com/index.php/Perl/Modules/Lingua

Page 13: Natural Language Processing with Per

FossConf 2008 Chennai

Question ?

Page 14: Natural Language Processing with Per

FossConf 2008 Chennai

[email protected]