Grammar Writing Lecture 6

28
1 Carnegie Mellon School of Computer Science LTI Grammars and Lexicons Copyright © 2007, Carnegie Mellon. All Rights Reserved. Grammar Writing Lecture 6 11-721 Grammars and Lexicons Teruko Mitamura [email protected] www.cs.cmu.edu/~teruko

description

Grammar Writing Lecture 6. 11-721 Grammars and Lexicons. Teruko Mitamura [email protected] www.cs.cmu.edu/~teruko. Schedule: November 26. Review of Bird2.gra Feedback on the 1 st assignment Project Grading Criteria Submit Grammar exercise (mlb.gra/results) - PowerPoint PPT Presentation

Transcript of Grammar Writing Lecture 6

Page 1: Grammar Writing Lecture 6

1Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Grammar WritingLecture 6

11-721

Grammars and Lexicons

Teruko Mitamura

[email protected]/~teruko

Page 2: Grammar Writing Lecture 6

2Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Schedule: November 26

• Review of Bird2.gra

• Feedback on the 1st assignment

• Project Grading Criteria

• Submit Grammar exercise (mlb.gra/results)

• Start Grammar exercise (Japanese grammar)

Page 3: Grammar Writing Lecture 6

3Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Review: Bird 2 Grammar

• Goal: To learn more on unification• Much better than bird.gra• Fine to have semantic features in f-structure• Some Problems:

– Not scalable semantic features• ((x0 semclass) = Morris)• (meower -) for “bird”

– Incomplete f-structures– Ambiguous f-structures

Page 4: Grammar Writing Lecture 6

4Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Feedback on the 1st assignmentPlanning• Make sure to estimate the time for each step (or each type)

and the total time• No schedule was presented• No duration was presented• More time for testing and debugging may be needed• No additional types were presentedDesign• Subj-verb agreement with “be” verb and a regular verb

– Test sentences must contain full sentences, not just pronoun and “be” verb.

– Test sentences should contain all the pronouns.

Page 5: Grammar Writing Lecture 6

5Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Feedback on the 1st assignment (2)• Intransitive/Transitive/Ditransitive with PP

– Make sure that sentences contain PP.– PP attachment may be ambiguous. Grammar should be

able to handle both noun attachment and verb attachment.

• Conjunction with PP and NP should be in an object position, not in a subject position

• If you choose Passive– Should contain “by” phrase.

• If you choose Sentences with the modified nominal construction. – Test sentences must contain full sentences, not just NP.

Page 6: Grammar Writing Lecture 6

6Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Feedback on the 1st assignment (3)

• If you choose Sentences with the Noun-Noun compounds, test sentences must contain full sentences.

• Optional notation doesn’t work with the parser

NP -> (DET) N• A quotation mark <N’> doesn’t work inside Lisp.• Not enough test sentences• No statement of testing purposes, reasons for

pass/failure

Page 7: Grammar Writing Lecture 6

7Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Feedback on the 1st assignment (4)• No feature structures• No attribute-value pair• You don’t need to write attribute-value pairs for

each sentence type• Test sentences were too short• Some worked more than it was required for the 1st

assignment• Overall Comment

– Follow the steps of the process of grammar writing (see the slides or handouts)

– If you have questions, please ask

Page 8: Grammar Writing Lecture 6

8Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Pseudo-Unification

• Ordering of test and action equation

• Right and left hand side of equation

• Split f-structure

Page 9: Grammar Writing Lecture 6

9Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Grammar Writing Project Grading Criteria

There are 5 parts, giving a letter grade separately

See the introduction slides for references

1. Plan/Design

2. Grammar

3. Test Suites

4. Results

5. Issues/Discussions

Page 10: Grammar Writing Lecture 6

10Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

1. Plan/Design

• Clear statement of goals, tasks, time estimates and actual time spent

• Set of structures covered

• Trees for each type of construction

• Grammatical functions used

• Feature structures used (attribute values)– See the grammar exercise cover page

Page 11: Grammar Writing Lecture 6

11Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

2. Grammar

• Grammar writing principles followed:– Generality– Extensibility– Simplicity– Selectivity

• Quality of grammar: easy to understand– Well-organized– Well-documented, etc.

Page 12: Grammar Writing Lecture 6

12Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

3. Test Suites

• Enough sentences to test each type of construction

• Include both grammatical and ungrammatical sentences

• Clear statement of testing purposes, reasons for pass/failure

Page 13: Grammar Writing Lecture 6

13Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

3. Test Suites (2) Examples

1. Full sentences that contain “be” verb Test for person agreement. *I are a studentTest for number agreement. *She is teachers Test for case agreement.

*Me is a teacher Test for missing determiner *She is teacher

Page 14: Grammar Writing Lecture 6

14Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

4. Results• Grammatical sentences – all parsed• Ungrammatical sentences – all failed• Well-formed f-structures

– No NILs in the f-structures– No missing f-structures, elements– No unnecessary redundancy– No unnecessary feature structures– No conflicting information in the f-structures– No unnecessary ambiguity

Page 15: Grammar Writing Lecture 6

15Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

5. Issues/Discussions (What you learned)

• Any interesting findings• Grammar design issues (decisions and results)• Grammar writing principles vs. actual

implementation issues• Time estimate vs. Actual time spent• Problems and Reasons

– E.g. Ambiguity: reasons for more than one parse

– E.g. Any limitations that you encountered• Platform, parser, grammar limitations

• Other issues/discussions/future extension

Page 16: Grammar Writing Lecture 6

16Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Files to submit

1. Plan and Grammar Design2. Test Suite3. Grammar File

- Specify the location of the grammar file4. Test Results: two separate files

1. Grammatical sentences (parsed)2. Ungrammatical sentences (failed)3. No trace in the results files. Set to (dmode 0)

5. Issues/Discussions– If you use English grammar references, list them.

Page 17: Grammar Writing Lecture 6

17Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Project Due

• December 7 (Friday) at 3pm

• Late submission will be down-graded

• Cari will collect the project (hard copy).

• Don’t send the files by email.

• Work alone.

Page 18: Grammar Writing Lecture 6

18Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Grammar Exercise (Japanese Grammar)

• Free word-order language• SOV language• Case markers determine grammatical relations (ga, wo, ni, de, etc)• Grammar file: jpn.gra• Test files: jpn-test1.lisp/afs/cs/project/cmt-55/lti/Lab/Modules/GNL-721/2007/

Page 19: Grammar Writing Lecture 6

19Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Lexicon nichiyoubi (Sunday) nyuuyooku (New York) hoomuran (home run) itta (went) utta (hit-past) Hideki, Ichiro (person’s name) ga (NOM case) wo (ACC case) ni (Time-on) ni (Loc-to) e (Loc-to) de (Loc-at)

Page 20: Grammar Writing Lecture 6

20Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Examples

Nichiyoubi ni Hideki ga Nyuuyook e itta.Sunday on Hideki NOM New York to go PAST

“Hideki went to New York on Sunday.”

Nichiyoubi ni Nyuuyooku e Hideki ga itta.Hideki ga nichiyoubi ni Nyuuyooku e itta.Hideki ga Nyuuyooku e nichiyoubi ni itta.Nyuuyooku e Hideki ga nichiyoubi ni itta.Nyuuyooku e nichiyoubi ni Hideki ga itta.

Page 21: Grammar Writing Lecture 6

21Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Examples (2)

Hideki ga Nyuuyooku e ittaNyuuyooku e Hideki ga ittaNichiyoubi ni Nyuuyooku e ittaNyuuyooku e nichiyoubi ni ittaHideki ga nichiyoubi ni ittaNichiyoubi ni Hideki ga ittaHideki ga ittaNyuuyooku e ittaNichiyoubi ni itta

Page 22: Grammar Writing Lecture 6

22Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Example

Nichiyoubi ni Ichiro ga hoomuran wo utta.Sunday on Ichiro NOM home run ACC hit-PAST

“Ichiro hit a home run on Sunday.”

Page 23: Grammar Writing Lecture 6

23Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Ungrammatical Sentences

• You can’t have two nominatives or accusatives in a sentence. (jpn-test1.lisp)

*Hideki ga nichiyoubi ga itta

*Hideki ga Hideki ga itta

*Hideki ga hoomuran ga utta

*Hideki wo hoomuran wo utta

Page 24: Grammar Writing Lecture 6

24Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Grammar

• Use of recursive rules

(<S> < == > (<NP> <S>)

(<S> < == > (<V>)

Page 25: Grammar Writing Lecture 6

25Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Recursive Rules

S

NP S

NP S

N P N P V

Hideki ga Nyuuyooku e itta

Page 26: Grammar Writing Lecture 6

26Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese Grammar (2)

• “ni” is ambiguous in Japanese

Time-on, Loc-to

Nichiyoubi ni itta (went on Sunday)

Nyuuyooku ni itta (went to New York)

• “ni” and “e” can be used for Loc-to

Nyuuyooku ni/e itta (went to New York)

Page 27: Grammar Writing Lecture 6

27Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Japanese grammar exercise

• Grammar file: /afs/cs/project/cmt-55/lti/Lab/Modules/GNL721/2007/jpn.gra

• Test file:

jpn-test1.lisp

Page 28: Grammar Writing Lecture 6

28Carnegie MellonSchool of Computer Science LTI Grammars and Lexicons

Copyright © 2007, Carnegie Mellon. All Rights Reserved.

Questions?