Syntactic Parsing Introduction - cl.lingfil.uu.se
Transcript of Syntactic Parsing Introduction - cl.lingfil.uu.se
![Page 1: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/1.jpg)
Syntactic Parsing
Introduction
Joakim Nivre and Daniel DakotaDepartment of Linguistics and Philology
Partly based on slides from Marco Kuhlmann and Sara Stymne
![Page 2: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/2.jpg)
Today
• Introduction to syntactic parsing
• Course information
![Page 3: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/3.jpg)
What is syntactic parsing?
• Given a natural language sentence, compute a representation of its syntactic structure
![Page 4: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/4.jpg)
What is syntactic parsing?
• Given a natural language sentence, compute a representation of its syntactic structure
![Page 5: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/5.jpg)
What is syntactic parsing?
• Given a natural language sentence, compute a representation of its syntactic structure
![Page 6: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/6.jpg)
What is syntactic parsing?
• Given a natural language sentence, compute a representation of its syntactic structure
![Page 7: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/7.jpg)
Why study syntactic parsing?
• Syntactic structure determines semantic interpretation
Disney acquired PixarPixar acquired DisneyPixar was acquired by Disney
• Natural language understanding applications
- Information extraction- Question answering
• Grammar checking
• Modeling human sentence processing
![Page 8: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/8.jpg)
Parsing as Structured Prediction
Parsing is a structured prediction problem:
X → Y
• Input space X: sentences
• Output space Y: syntactic representations
• X and Y are infinite sets of structured objects
• Complexity requires specialised algorithms
![Page 9: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/9.jpg)
Input Space
A sentence x ∈ X is a sequence of tokens x1, …, xn
• How do we delimit sentences?
• How do we split sentences into tokens?
- Different writing systems (white space or not)
- Different degrees of morphological complexity
• How do we preprocess tokens before parsing?
- Part-of-speech tagging
- Morphological analysis
- Lemmatization
![Page 10: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/10.jpg)
Output Space
A syntactic representation y ∈ Y is a labeled graph
• Constituency-based representations
- Sentences decomposed into phrases and words
- Focus on internal structure of phrases
• Dependency-based representations
- Words connected by dependency relations
- Focus on functional role of words
• Hybrid representations exist
![Page 11: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/11.jpg)
Syntactic Representations
![Page 12: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/12.jpg)
Syntactic Representations
Terminal nodes
![Page 13: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/13.jpg)
Syntactic Representations
Nonterminal nodes
![Page 14: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/14.jpg)
Syntactic Representations
Root node
![Page 15: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/15.jpg)
Syntactic Representations
![Page 16: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/16.jpg)
Syntactic Representations
Nodes = Terminal nodes
![Page 17: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/17.jpg)
Syntactic Representations
Labeled arcs
![Page 18: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/18.jpg)
Syntactic Representations
Root node
![Page 19: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/19.jpg)
Mapping
What is the relation between inputs and outputs?
X → Y
• Grammar G generates x ∈ X with analysis y ∈ Y
- Recognition – does G generate x (yes/no)?
- Parsing – derive (all) y assigned to x by G
• Every instance of x has a single interpretation y
- Disambiguation – select the “best” y for x
- Parsing as optimization: y* = argmaxy f(x, y)
![Page 20: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/20.jpg)
Ambiguity
I shot an elephant in my pajamas.
• This sentence is ambiguous. In what way?
• What should happen if we parse the sentence?
![Page 21: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/21.jpg)
Ambiguity
![Page 22: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/22.jpg)
Ambiguity
Recognition: Yes
![Page 23: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/23.jpg)
Ambiguity
Recognition: Yes
Parsing with G
![Page 24: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/24.jpg)
Ambiguity
Recognition: Yes
Parsing with G
Disambiguation
![Page 25: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/25.jpg)
Parsing Models
A model for parsing (with disambiguation):
• GEN(x) ⊆ Y
- Defines the set of candidate analyses for x
• EVAL(x, y) ∈ R
- Scores an analysis y in relation to x
• Parsing algorithm
- Finds the highest scoring analysis y*
![Page 26: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/26.jpg)
Parsing Models
PCFG Parsing Dependency Parsing
GEN(x) { y | G derives x with y } { y | y is a spanning tree for x }
EVAL(x, y) PG(x, y) = PG(y) S(x, y) = ∑ S(xi, xj) for (xi, xj) in y
Algorithm CKY, Earley, … Collins, Eisner, …
![Page 27: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/27.jpg)
Parsing as Search
• Parsing algorithms have to search the space of candidate trees
• In order to search the space of trees we have to build them
• To do parsing efficiently, we have to use “smart” search strategies
![Page 28: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/28.jpg)
How many trees are there?
0
400
800
1200
1600
1 2 3 4 5 6 7 8
linear cubic exponential
![Page 29: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/29.jpg)
Search Strategies
• Naive enumeration takes exponential time: O(2n)
• Dynamic programming
- Solve each subproblem only once
- Parsing in polynomial time, often cubic: O(n3)
- Examples: CKY, Earley, Eisner
• Greedy search and beam search
- Best-first search of a (small) subspace
- Parsing in sub-cubic time, often linear: O(n)
- Example: Transition-based parsing
![Page 30: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/30.jpg)
Structured Prediction Again
• The algorithmic challenges arising in parsing are typical of structured prediction problems.
• Many of the techniques can be applied directly to other problems, for example, sequence labeling.
• This parsing course has an algorithmic focus in order to teach these techniques.
![Page 31: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/31.jpg)
Treebanks
Syntactically annotated corpora (treebanks) have two main uses in parser development:
• The parameters of the scoring function EVAL(x, y) can be estimated from a training set
- This may involve more or less advanced machine learning methods – not a focus of this course
- Grammars may also be extracted from treebanks
• Parsing accuracy can be evaluated on a test set
- Most evaluation metrics compute partial matches between parse trees and treebank trees
- Test set must be distinct from training set – why?
![Page 32: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/32.jpg)
Questions?
![Page 33: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/33.jpg)
Course Information
All course information is in Studium
(TimeEdit schedule is not up to date)
![Page 34: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/34.jpg)
Intended Learning Outcomes
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
![Page 35: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/35.jpg)
Examination
Examination is continuous and based on
• 3 assignments
- 2 programming assignments
- 1 literature review
• 2 literature seminars
• 1 project
![Page 36: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/36.jpg)
Programming Assignments
Assignment 1: PCFG parsing
• Implement CKY algorithm
• Evaluate parser on treebank data
Assignment 3: Dependency parsing
• Implement arc-eager transition system
• Implement oracle for arc-eager transition system
• Evaluate oracle on treebank data
![Page 37: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/37.jpg)
Literature Review
Assignment 2: Literature review
• Pick two research articles about parsing
- Journal, conference or workshop papers
- Main topic should be parsing methods
• Write a 3-page report
- Summarize, analyze, critically review
![Page 38: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/38.jpg)
Literature Seminars
Seminars to discuss an article in smaller groups
• Preparation:
- Read article
- Go through questions and discussion points
• Active participation is obligatory
- If you miss a seminar or do not participate actively, you have to hand in a written report
![Page 39: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/39.jpg)
Project
Organization:
• Define your own project – suggestions in Studium
• Work individually or in pairs
Activities:
• Project proposal – February 26
- Assignment of supervisor
• Oral discussion (only for pairs) – March 22
• Project report – March 26
![Page 40: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/40.jpg)
Learning Outcomes and Examination
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
![Page 41: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/41.jpg)
Learning Outcomes and Examination
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
Ass 1–3Seminars
![Page 42: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/42.jpg)
Learning Outcomes and Examination
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
Ass 1–3Seminars
Ass 1, 3
![Page 43: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/43.jpg)
Learning Outcomes and Examination
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
Ass 1–3Seminars
Ass 1, 3
Ass 2Seminars
![Page 44: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/44.jpg)
Learning Outcomes and Examination
At the end of the course, you should be able to
• explain the standard models and algorithms used in phrase structure and dependency parsing
• implement and evaluate some of these techniques
• critically evaluate scientific publications in the field of syntactic parsing
• design, evaluate, or theoretically analyse the syntactic component of an NLP system
Ass 1–3Seminars
Ass 1, 3
Ass 2Seminars
Project
![Page 45: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/45.jpg)
Grading
Grading scales:
• Assignments and project: U–G–VG
• Literature seminars: U–G
To achieve G on the course:
• G on all assignments, seminars and project
To achieve VG on the course:
• VG on 3 assignments or 1 assignment + project
![Page 46: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/46.jpg)
Teachers
Joakim Nivre
• Examiner and course coordinator
• Lectures, seminars, project supervision
• Assignment 2 and 3
Daniel Dakota
• Lectures, seminars, project supervision
• Assignments 1 and 2
![Page 47: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/47.jpg)
Teaching
All teaching and tutoring is on line
• Monday and Wednesday 10.15–12:
- 10 sessions (90 min, whole class)
• Live lecture on Zoom
• Recorded lecture + Q&A
- 2 literature seminars (45 min, small groups)
• Monday 13:15–14 (from February 1):
- Tutoring for assignments and project
![Page 48: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/48.jpg)
Teaching
All teaching and tutoring is on line
• Monday and Wednesday 10.15–12:
- 10 sessions (90 min, whole class)
• Live lecture on Zoom
• Recorded lecture + Q&A
- 2 literature seminars (45 min, small groups)
• Monday 13:15–14 (from February 1):
- Tutoring for assignments and project
CHECK SCHEDULE
![Page 49: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/49.jpg)
Lectures
• Lectures and course books cover basic parsing algorithms in detail
• They touch on more advanced material, but you will need to read up on that independently
• To get the most out of the lectures, prepare by watching recorded lectures and reading the literature in advance
![Page 50: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/50.jpg)
Course Workload
7.5 hp means about 200 hours work:
~ 40 h lectures (including preparation)
2 h seminars
158 h work on your own
~ 80 h assignment work (including reading)
~ 10 h seminar preparation
~ 68 h project work
![Page 51: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/51.jpg)
Deadlines
Assignment Deadline1: PCFG parsing February 19
2: Literature review March 123: Dependency parsing March 26Project proposal February 26Project report March 26
Backup April 23
Seminars Date
Seminar 1 February 8, 10Seminar 2 March 1, 3
![Page 52: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/52.jpg)
Reading: Course Books
Daniel Jurafsky and James H. Martin. Speech and Language Processing. 3rd edition. 2020. Chapters 12–15. Available on line at: https://web.stanford.edu/~jurafsky/slp3/.
Sandra Kübler, Ryan McDonald, and Joakim Nivre. Dependency Parsing. Morgan and Claypool. 2009. Chapters 1–4, 6. Available as e-book at: ub.uu.se.
![Page 53: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/53.jpg)
Reading: Articles
Seminar 1:
• Mark Johnson. PCFG Models of Linguistic Tree Representations. Computational Linguistics 24(4), 613–632.
Seminar 2:
• Joakim Nivre and Jens Nilsson. Pseudo-Projective Dependency Parsing. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 99–106.
Individual selection of articles for literature review (Ass 2).
![Page 54: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/54.jpg)
Evaluation 2020
• Overall score: 3.7 / 5 (Median 4)
• What did you find valuable to the degree that it should be retained:
- Assignments interesting and suitably challenging. Flexibility in choosing and executing final project was good. Programming assignments were useful (but hard).
- The combination of implementing basic algorithms such as CKY and talking about more advanced approaches was really helpful.
- Very well organised. Great slides and lectures.
• Which course components did you find most in need of revision?
- The workload felt a bit too high for a 7.5 credit course.
- Vague assignment instructions. The first lab was the most difficult and confusing.
![Page 55: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/55.jpg)
Changes 2021
• One assignment less than in 2021.
• Assignment 1 redesigned and simplified.
![Page 56: Syntactic Parsing Introduction - cl.lingfil.uu.se](https://reader033.fdocuments.us/reader033/viewer/2022060902/62992cf7ce825630ce72ab54/html5/thumbnails/56.jpg)
Questions?