LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

46
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

LING/C SC/PSYC 438/538Computational Linguistics

Sandiway Fong

Lecture 15: 10/16

Administrivia

• No lecture this Thursday

Today’s Topics

• Midterm review

• Finite State Transducers (FST)

Question 1

• Download the file wsj.txt (~ 50K lines)• Write a Perl program • that finds all lines containing any possible

form of the idiom

take ... advantage of ...• How many are there in wsj.txt?• Submit your program• Submit the lines returned by your program

Question 1

• First hit on Google:– take advantage (of someone) to use someone's weakness to

improve your own situation. Mr. Smith often takes advantage of my friendship and leaves the unpleasant tasks for me to do.See also: advantage, take

– take advantage (of something) to use an opportunity to get or achieve something. He took advantage of the prison's education program to earn a college degree. There are peaches and strawberries grown on the farm, and I sure take full advantage of them.Usage notes: often said of someone who has opportunities that others do not have: The rich can take advantage of clever accounting tricks to avoid taxes.See also: advantage, take

– Cambridge Dictionary of American Idioms – Cambridge University Press 2003

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Answer 11. Investors took advantage of Tuesday 's stock rally2. Like other forms of arbitrage , it merely seeks to take advantage of momentary discrepancies3. As usually practiced it takes advantage of a rather basic concept4. So if index arbitrage is simply taking advantage of thin inefficiencies5. `` If you could get the rhythm of the program trading , you could take advantage of it . '' 6. Mrs. Gorman took advantage of low prices 7. According to Upjohn 's estimates , only 50 % to 60 % of the 1,100 eligible employees will take advantage of the plan . 8. Nissan has increased earnings more than market share by cutting costs and by taking advantage of a general surge 9. Mr. Peladeau took his first big gamble 25 years ago , when he took advantage of a strike at La Presse10. In addition , the two companies will develop new steam turbine technology , such as the plants ordered by Florida

Power , and even utilize each other 's plants at times to take advantage of currency fluctuations . 11. One of GE 's goals when it bought 80 % of Kidder in 1986 was to take advantage of `` syngeries '' 12. I take advantage of this opportunity given to me by The Wall Street Journal

• And taking more direct action has the advantage of avoiding sharp increases13. To take advantage of local expertise and custom14. Several blue-chip companies tapped the new-issue market yesterday to take advantage of falling interest rates . 15. He also noted that a strong sterling market yesterday might have helped cocoa in New York as arbitragers took

advantage of the currency move . 16. My kids ' college education looms as perhaps the greatest future opportunity for spending , although I 'll probably have

to cash in their toy portfolio to take advantage of it . 17. As the ad 's tone implies , the Texas spirit is pretty xenophobic these days , and Lone Star is n't alone in trying to take

advantage of that . 18. IBM , which Gartner Group said generates 22 % of its revenue in this market , should be able to take advantage of its

loyal following19. Erik Keller , a Gartner Group analyst , said organizational changes may still be required to really take advantage of

CIM 's capabilities

Answer 120. These latter-day scalawags would be ill-advised to take advantage of the situation

21. Most of trading action now is from professional traders who are trying to take advantage of the price swings

22. For instance , First Quadrant Corp. , an asset allocator based in Morristown , N.J. , said it quickly boosted stock positions in its `` aggressive '' accounts to 75 % from 55 % to take advantage of plunging prices Friday .

23. Others are doing `` index arbitrage '' a strategy of taking advantage of price discrepancies

24. The campaign , created by Omnicom Group 's DDB Needham agency , takes advantage of the eye-catching photography

25. According to industry lawyers , the ruling gives pipeline companies an important second chance to resolve remaining disputes and take advantage of the cost-sharing mechanism .

26. Thanks to a new air-traffic agreement and the ability of Irish travel agents to issue Aeroflot tickets , tourists here are taking advantage of Aeroflot 's reasonable prices

27. But , `` You never can tell , '' he added , `` you have to take advantage of opportunities .

28. A broad rally began when several major processors began buying futures contracts , apparently to take advantage of the price dip .

29. `` We hope to take advantage of it , ''

30. And we hope to take advantage of panics

31. To take full advantage of the financial opportunities32. Specifically , it must understand how real-estate markets overreact to shifts in regional economies and then take

advantage of these opportunities .

Answer 1

• Perl Program: – a simple way to exclude the case shown

earlier

open (F,$ARGV[0]) or die "$ARGV[0] not found!\n";

while (<F>) {

print $_ if (/\b(take|takes|taking|taken|took)\b(.*) advantage of/ && $2 !~ /\bthe\b/)

}

Question 2

• Give a regular grammar in Prolog notation that accepts strings with an odd number of a’s (#a’s =1,3,5,...) followed by an even number of b’s (#b’s = 2,4,6,...)

• i.e. anbm

n odd, m even

• Examples: – aaabb– abbbb– aaaaabb– *aabb– *aaab

• Submit your program• Show it works on the

given examples

Answer 2

• Regular grammar in Prolog DCG format:

1. s --> [a], b.

2. s --> [a], d.

3. b --> [a], s.

4. d --> [b], e.

5. e --> [b].

6. e --> [b], d.

• Run| ?- s([a,a,a,b,b],[]).

yes

| ?- s([a,b,b,b,b],[]).

yes

| ?- s([a,a,a,a,a,b,b],[]).

yes

| ?- s([a,a,b,b],[]).

no

| ?- s([a,a,a,b],[]).

no

Question 3

• Using an extra argument with regular grammar rules in Prolog DCG format, give a grammar that accepts

• L = anbm • n even (n=2,4,6,...)• m is the odd number

closest to but not exceeding n/2

• Note: L is a non-regular language

• Examples:– aab– aaaab– *aaaabb– aaaaaabbb– *aaaaaabbbb– aaaaaaaabbb– *aaaaaaaabbbb– *aaaaaaaabbbbb

• Show your program works on the above examples

Answer 3

• Program1. s(X) --> [a], b(s(X)).

2. b(X) --> [a], c(s(X)).

3. b(X) --> [a], s(s(X)).

4. c(s(s(0))) --> [b].

5. c(s(s(s(s(0))))) --> [b].

6. c(s(s(X))) --> [b], d(X).

7. d(s(s(X))) --> [b], c(X).

• Run| ?- s(0,[a,a,b],[]).yes| ?- s(0,[a,a,a,a,b],[]).yes| ?- s(0,[a,a,a,a,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,b,b,b],[]).yes| ?- s(0,[a,a,a,a,a,a,b,b,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b],[]).yes| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b,b],[]).no

Question 4

• Give a regexp for the language described in Question 2

• anbm

n odd, m even

Answer 4

• anbm n odd, m even

• a(aa)*(bb)+

Question 5

• Give a regexp for the complement of the following FSA

1

2

4

35

ba

ab

a

b

a,ba

b

Answer 5

• Original machine is deterministic

• Flip the states

1

2

4

35

ba

ab

a

b

a,ba

b

1

2

4

35

ba

ab

a

b

a,ba

b

Answer 5

• Notice 5 is a dead-end state

• Erase 5

1

2

4

35

ba

ab

a

b

a,ba

b

1

2

4

3

ba

ab

a

b

Answer 5

• Eliminated state 5 • Eliminate states 2 and 4

1

2

4

3

ba

ab

a

b

1 3

ab

ba

ab

ba

(ab|ba)*

Answer 5

• Eliminated state 5

• Equations– E1 = aE2 | bE4 | λ– E2 = bE3– E4 = aE3– E3 = aE2 | bE4 | λ

• Eliminate E4– E1 = aE2 | baE3 | λ– E3 = aE2 | baE3 | λ

• Eliminate E2– E1 = abE3 | baE3 | λ– E3= abE3 | baE3 | λ

• Group E3– E1 = (ab|ba)E3 | λ– E3 = (ab|ba)E3 | λ

• Solve E3– E3 = (ab|ba)*– E1 = (ab|ba)(ab|ba)*|λ = (ab|ba)*

1

2

4

3

ba

ab

a

b

Question 6

• Give the deterministic FSA corresponding to:

Answer 6

• Deterministic machine

1

5

2a

3c 4

c

a

b

6b

a

8

a

c

7a

Finite State Transducers

• Just like Finite State Automata (FSA) except for an output tape

• Mealy Machine formulation:– at each transition, a FST

can read an input symbol and output a (different) symbol onto the tape

• Background reading– Chapter 3 of the

textbook

Morphology

• morphology– words are composed of morphemes – morpheme: basic semantic unit, e.g. -ee in employee– Inflectional: no change in category, e.g. V -ed V– can carry information about tense, personal, number,

gender, case etc. – Derivational: category-changing, e.g. V -able A

– very productive

Walkers. Standees.

© Sandiway Fongsign above travelatorat Pittsburgh International Airport

Today’s Topic

• Finite State Transducers (FST) for morphological processing

– ... also Prolog implementation

Recall Finite State Automata (FSA)

• from lecture 8– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): {a, b}5. transition function :

signature: character × state → state1. (a,s)=x2. (a,x)=x3. (b,x)=y4. (b,y)=y

s x

y

aa

b

b

Modeling English Adjectives using FSA

– from section 3.2 of textbook

• examples– big, bigger, biggest, *unbig– cool, cooler, coolest, coolly– red, redder, reddest, *redly– clear, clearer, clearest, clearly, unclear, unclearly– happy, happier, happiest, happily– unhappy, unhappier, unhappiest, unhappily– real, *realer, *realest, unreal, really

• fsa (3.4)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Initial machineis overly simple

need more classesto make finer grain distinctions

e.g. *unbig

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Modeling English Adjectives using FSA

• divide adjectives into classes• examples

– adj-root2: big, bigger, biggest, *unbig– adj-root2: cool, cooler, coolest, coolly– adj-root2: red, redder, reddest, *redly– adj-root1: clear, clearer, clearest, clearly, unclear, unclearly– adj-root1: happy, happier, happiest, happily– adj-root1: unhappy, unhappier, unhappiest, unhappily– adj-root1: real, *realer, *realest, unreal, really

• fsa (3.5)

However...Examplesuncooler •Smoking uncool and getting uncooler.•google: 22,800 (2006), 10,900 (2005) *realer •google: 3,500,000 (2006) 494,000 (2005)

*realest •google: 795,000 (2006) 415,000 (2005)

Modeling English Adjectives using FSA

e.g. *unbig google: 2,590 hits (2007)

morphology is productivemorphemes carry (compositional) meaningcan be used for dramatic effect unbig vs. small

The Mapping Problem

• To map between a surface form and the decomposition of a word into its components– e.g. root + (person/number/gender) and other features

• using spelling rules

• Example: (3.11)

Notes:^ marks a morpheme boundary# is the end-of-word marker

Stage 1: Lexical Intermediate Levels

• example:– f o x +N +PL (lexical)– f o x ^s# (intermediate)

• lexical level: – uninflected “dictionary” level

• intermediate level: – replace abstract morphemes by concrete ones

• key– +N: noun

• fox can also be a verb, • but fox +V cannot combine with +PL

– +PL: (abstract) plural morpheme• realized in English as s (basic case)

– boundary markers ^ and # • for use by the spelling rule machine (later)

Stage 1: Lexical Intermediate Levels

• example:– f o x +N +PL (lexical)– f o x ^s# (intermediate)

• machine idea – character-by-character correspondences– f f – o o– x x– +N ( = empty string)– +PL ^s#

• use a Finite State Machine with input/output mapping– Finite State Transducer (FST)

Stage 1: Lexical Intermediate Levels

• Example:– g o o s e +N +PL (lexical)– g e e s e # (intermediate)

• Example:– g o o s e +N +SG (lexical)– g o o s e # (intermediate)

• Example:– m o u s e +N +PL (lexical)– m i c e # (intermediate)

• Example:– s h e e p +N +PL (lexical)– s h e e p # (intermediate)

Stage 1: Lexical Intermediate Levels

• 3.11

Notation:

input : output

f means f:f

Extension to Finite State Transducers (FST)

• [Mealy machine extension to FSA]– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): pairs I:O– I = input alphabet, O = output alphabet

– ε may be included in I and O

– transition function (or matrix) : signature: i/o pair × state → state1. (a:b,s)=x2. (a:b,x)=x3. (b:a,x)=y4. (b:ε,y)=y

s x

y

a:b a:b

b:ε

b:a

Finite State Automata (FSA)

• recall: one possible Prolog encoding strategy

– define one predicate for each state• taking one argument (the input string)• consume input character• call next state with remaining input string

– query•?- s(L).

call start state s

Finite State Automata (FSA)

– define one predicate for each state• take one argument (the input string), and consume input character• call next state with remaining input string

– query• ?- s(L). i.e. call start state s

– state s: (start state)• s([a|L]) :- x(L).

– state x:• x([a|L]) :- x(L).• x([b|L]) :- y(L).

– state y: (end state)• y([]).• y([b|L]) :- y(L).

s x

y

aa

b

b

simple extension to FST: each predicate takes two arguments:input and output

Stage 1: Lexical Intermediate Levels

• example– s0([f|L1],[f|L2]) :- s1(L1,L2).– s0([c|L1],[c|L2]) :- s3(L1,L2).

– s1([o|L1],[o|L2]) :- s2(L1,L2).– s2([x|L1],[x|L2]) :- s5(L1,L2).– s3([a|L1],[a|L2]) :- s4(L1,L2).– s4([t|L1],[t|L2]) :- s5(L1,L2).

– s5([‘+N’|L1],L2) :- s6(L1,L2).– s6([‘+PL’|L1],[^,s,#|L2]) :- s7(L1,L2).– s7([],[]). % end state

Stage 1: Lexical Intermediate Levels

• FST queries– lexical intermediate

• ?- s0([f,o,x,’+N’,’+PL’],X).– X = [f, o, x, ^, s, #]

– intermediate lexical • ?- s0(X,[f,o,x,^,s,#]).

– X = [f, o, x, '+N', '+PL']

– enumerator• ?- s0(X,Y).

– X = [f, o, x, '+N', '+PL']– Y = [f, o, x, ^, s, #] ;– X = [c, a, t, '+N', '+PL']– Y = [c, a, t, ^, s, #] ;

• No

inversion of a transducer T: T-1

switch input and output labels

in Prolog, simply change the call

Stage 1: Lexical Intermediate Levels

• Figure 3.17 (top half):tape view of input/output pairs

The Mapping Problem

• Example: (3.11)

• (Context-Sensitive) Spelling Rule: (3.5) e / {x,s,z}^__ s#

rewrites to letter e in left context x^ or s^ or z^ and right context s#

• i.e. insert e after the ^ when you see x^s# or s^s# or z^s#

• in particular, we have x^s# x^es#

Stage 2: Intermediate Surface Levels

• also can be implemented using a FSTimportant!machine is designed to pass input not matching the rule through unmodified (rather than fail)

implements context-sensitive ruleq0 to q2 : left contextq3 to q0 : right context

Stage 2: Intermediate Surface Levels

• Example (3.17)

Stage 2: Intermediate Surface Levels

• Transition table for FST in 3.14

pg.79

Stage 2: Intermediate Surface Levels

• in Prolog (simplified)– q0([],[]). % final state– q0([^|L1],L2) :- !, q0(L1,L2). % ^: – q0([z|L1],[z|L2]) :- !, q1(L1,L2). – % repeat for s,x– q0([#|L1],[#|L2]) :- !, q0(L1,L2).– q0([X|L1],[X|L2]) :- \+ mentioned(X),

q0(L1,L2). % other

• ! is known as the “cut” predicate– it affects how Prolog backtracks for another

solution

– it means “cut” the backtracking off

– Prolog will not try any other possible matching rule on backtracking

Exercise

• Ungraded exercise:– Implement 3.14 in Prolog– Make sure you can do e-insertion and the

inverse operation, i.e. go from surface form to intermediate form