Baselines, Evaluations, and Analysis - Spence Green
Transcript of Baselines, Evaluations, and Analysis - Spence Green
![Page 1: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/1.jpg)
Better Arabic ParsingBaselines, Evaluations, and Analysis
Spence Green and Christopher D. Manning
Stanford University
August 27, 2010
![Page 2: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/2.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Common Multilingual Parsing Questions...
Is language X “harder” to parse than language Y?
I Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
Does feature Z “help more” for language X than Y?
I Lexicalization
I Morphological annotations
I Markovization
I etc.
2 / 32
![Page 3: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/3.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Common Multilingual Parsing Questions...
Is language X “harder” to parse than language Y?
I Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
Does feature Z “help more” for language X than Y?
I Lexicalization
I Morphological annotations
I Markovization
I etc.
2 / 32
![Page 4: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/4.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Common Multilingual Parsing Questions...
Is language X “harder” to parse than language Y?
I Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
Does feature Z “help more” for language X than Y?
I Lexicalization
I Morphological annotations
I Markovization
I etc.
2 / 32
![Page 5: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/5.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Common Multilingual Parsing Questions...
Is language X “harder” to parse than language Y?
I Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
Does feature Z “help more” for language X than Y?
I Lexicalization
I Morphological annotations
I Markovization
I etc.
2 / 32
![Page 6: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/6.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Common Multilingual Parsing Questions...
Is language X “harder” to parse than language Y?
I Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
Does feature Z “help more” for language X than Y?
I Lexicalization
I Morphological annotations
I Markovization
I etc.
2 / 32
![Page 7: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/7.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
“Underperformance” Relative to English
English
Chinese
Bulgarian
German
French
Italian
Arabic
Arabic – This Paper
90.1
83.7
81.6
80.1
77.9
75.6
75.8
81.1
Evalb F1 - All Sentence Lengths
(Petrov, 2009)3 / 32
![Page 8: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/8.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Why Arabic / Penn Arabic Treebank (ATB)?
Annotation style similar to PTB
Relatively little segmentation (cf. Chinese)
Richer morphology (cf. English)
More syntactic ambiguity (unvocalized)
4 / 32
![Page 9: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/9.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
ATB DetailsParts 1–3 (not including part 3, v3.2)
Newswire only
I Agence France Presse, Al–Hayat, Al–Nahar
Corpus/experimental characteristics
I 23k trees
I 740k tokens
I Shortened “Bies” POS tags
I Split: 2005 JHU workshop5 / 32
![Page 10: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/10.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 11: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/11.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 12: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/12.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 13: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/13.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 14: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/14.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 15: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/15.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Multilingual ParsingArabic
Arabic Preliminaries
Diglossia: “Arabic” −→ MSA
Typology: VSO — VOS, SVO, VO also possible
Devocalization
�� �� �� � � �� �� �� � �� ��
���� ����� �
Segmentation: an analyst’s choice!
I ATB uses clitic segmentation
+ +���� �� �� � �
6 / 32
![Page 16: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/16.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Syntactic Ambiguity in Arabic
This talk:
I Devocalization
I Discourse level coordination ambiguity
Many other types:
I Adjectives / Adjective phrases
I Process nominals — maSdar
I Attachment in annexation constructs (Gabbard and Kulick,
2008)
7 / 32
![Page 17: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/17.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Devocalization: Inna and her Sisters
POS Head Of
�� � inna “indeed” VBP VP
�� � anna “that” IN SBAR
�� � in “if” IN SBAR
�� � an “to” IN SBAR
8 / 32
![Page 18: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/18.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Devocalization: Inna and her Sisters
POS Head Of
�� inna “indeed” VBP VP
�� anna “that” IN SBAR
�� in “if” IN SBAR
�� an “to” IN SBAR
9 / 32
![Page 19: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/19.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Devocalization: Inna and her Sisters
VP
VBD
اضافت
she added
S
VP
PUNC
“
VBP
ان
Indeed
NP
NN
صدام
Saddam
. . .
Reference
VP
VBD
اضافت
she added
SBAR
PUNC
“
IN
ان
Indeed
NP
NN
صدام
Saddam
. . .
Stanford
10 / 32
![Page 20: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/20.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Discourse–level Coordination Ambiguity
S
S
NP VP
CC
و
and
S
VP
CC
و
and
S
NP PP NP
I S < S in 27.0% of dev set trees
I NP < CC in 38.7% of dev set trees
11 / 32
![Page 21: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/21.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Discourse–level Coordination Ambiguity
Leaf Ancestor metric reference chains (Berkeley)
I score ∈ [0,1]
Score # Gold
0.696 34
0.756 170 S < VP < NP < CC
0.768 31
0.796 86
0.804 52
S < S < VP < NP < PRP
S < S < VP < S < VP < PP < IN
S < S < VP < SBAR < IN
S < S < NP < NN
12 / 32
![Page 22: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/22.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Treebank Comparison
Compared ATB gross corpus statistics to:
I Chinese — CTB6
I English — WSJ sect. 2-23
I German — Negra
The ATB isn’t that unusual!
13 / 32
![Page 23: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/23.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Treebank Comparison
Compared ATB gross corpus statistics to:
I Chinese — CTB6
I English — WSJ sect. 2-23
I German — Negra
The ATB isn’t that unusual!
13 / 32
![Page 24: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/24.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Corpus Features in Favor of the ATB
ATB
CTB6
Negra
WSJ
1.04
1.18
0.46
0.82
Non-terminal / Terminal Ratio
16.8%
22.2%
30.5%
13.2%
OOV Rate
14 / 32
![Page 25: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/25.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Corpus Features in Favor of the ATB
ATB
CTB6
Negra
WSJ
1.04
1.18
0.46
0.82
Non-terminal / Terminal Ratio
16.8%
22.2%
30.5%
13.2%
OOV Rate
14 / 32
![Page 26: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/26.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Sentence Length Negatively Affects Parsing
ATB
CTB6
Negra
WSJ
31.5
27.7
17.2
23.8
Avg. Sentence Length
40 words is not a sufficient limit for evaluation!
15 / 32
![Page 27: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/27.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Syntactic AmbiguityAnalysis and Evaluation
Sentence Length Negatively Affects Parsing
ATB
CTB6
Negra
WSJ
31.5
27.7
17.2
23.8
Avg. Sentence Length
40 words is not a sufficient limit for evaluation!
15 / 32
![Page 28: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/28.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Developing a Manually Annotated GrammarKlein and Manning (2003)–style state splits
I Human–interpretable
I Features can inform treebank revision
NP
NN NP
NN NP
DTNN
NP—idafa
NN NP—idafa
NN NP
DTNN
Alternative: automatic splits (Berkeley parser)
16 / 32
![Page 29: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/29.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Developing a Manually Annotated GrammarKlein and Manning (2003)–style state splits
I Human–interpretable
I Features can inform treebank revision
NP
NN NP
NN NP
DTNN
NP—idafa
NN NP—idafa
NN NP
DTNN
Alternative: automatic splits (Berkeley parser)
16 / 32
![Page 30: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/30.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Developing a Manually Annotated GrammarKlein and Manning (2003)–style state splits
I Human–interpretable
I Features can inform treebank revision
NP
NN NP
NN NP
DTNN
NP—idafa
NN NP—idafa
NN NP
DTNN
Alternative: automatic splits (Berkeley parser)16 / 32
![Page 31: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/31.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Feature: markContainsVerbS—hasVerb
NP-SBJ
NN NP
SBAR—hasVerb
IN S—hasVerb
NP-TPC VP—hasVerb
VB. . .
I +1.18 dev set improvement
I 16.1% of dev set trees lack verbs
17 / 32
![Page 32: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/32.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Feature: markContainsVerbS—hasVerb
NP-SBJ
NN NP
SBAR—hasVerb
IN S—hasVerb
NP-TPC VP—hasVerb
VB. . .
I +1.18 dev set improvement
I 16.1% of dev set trees lack verbs17 / 32
![Page 33: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/33.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Feature: splitCCS
S CC—S S
NP
NP
NN CC—noun NN
. . .
I +0.21 dev set improvement
I POS equivalence classes for verb, noun, adjective
18 / 32
![Page 34: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/34.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Features
Feature: splitCCS
S CC—S S
NP
NP
NN CC—noun NN
. . .
I +0.21 dev set improvement
I POS equivalence classes for verb, noun, adjective
18 / 32
![Page 35: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/35.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Gold Experimental Setup
Evaluation on lengths ≤ 70
Pre–processing makes a huge difference
I Maintained non–terminal / terminal ratio vv. priorwork
Models:
I Berkeley
I Bikel — with pre–tagged input
I Stanford — with the manual grammar
19 / 32
![Page 36: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/36.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Learning Curves
Evalb (development set)
75
80
85
5000 10000 15000
Berkeley
Stanford
Bikel
training trees
F1
20 / 32
![Page 37: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/37.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Model Comparison
Berkeley
Bikel
Stanford
Petrov (2009)
82.0
77.5
79.9
81.1
76.5
78.3
75.9 up to 70
All
Evalb F1
21 / 32
![Page 38: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/38.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Raw Text Experimental Setup
Pipeline: MADA + Stanford parser
Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008)
Evaluation on gold (segmented) lengths ≤ 70
Metric: Evalb without whitespace
I Requires exact character yield (Tsarfaty, 2006)
22 / 32
![Page 39: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/39.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Raw Text Experimental Setup
Pipeline: MADA + Stanford parser
Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008)
Evaluation on gold (segmented) lengths ≤ 70
Metric: Evalb without whitespace
I Requires exact character yield (Tsarfaty, 2006)
22 / 32
![Page 40: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/40.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Raw Text Experimental Setup
Pipeline: MADA + Stanford parser
Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008)
Evaluation on gold (segmented) lengths ≤ 70
Metric: Evalb without whitespace
I Requires exact character yield (Tsarfaty, 2006)
22 / 32
![Page 41: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/41.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Raw Text Experimental Setup
Pipeline: MADA + Stanford parser
Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008)
Evaluation on gold (segmented) lengths ≤ 70
Metric: Evalb without whitespace
I Requires exact character yield (Tsarfaty, 2006)
22 / 32
![Page 42: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/42.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Lattice Parsing — “ABCD EFG”
ABC
ABCD
EFG
EF G
D
A
BC
BCD
23 / 32
![Page 43: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/43.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Lattice Parsing — “ABCD EFG”
ABC EFG
EF G
D
A
BC
BCD
A
ABC
BC
ABCD
24 / 32
![Page 44: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/44.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Lattice Parsing — “ABCD EFG”
ABC EFG
EF G
D
A
BC
BCD
A
ABC
BC
BCD
D
ABCD
EFG
EF G
ABCD
25 / 32
![Page 45: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/45.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Parsing and Segmentation Results
Gold
Pipeline
Lattice
Lattice+LM
81.1
79.2
74.3
76.0
Evalb F1 (Dev set)
26 / 32
![Page 46: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/46.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Parsing and Segmentation Results
Gold
Pipeline
Lattice
Lattice+LM
100.0
97.7
94.1
96.3
Segmentation Accuracy (Dev set)
Comparable segmentation without the effort!
27 / 32
![Page 47: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/47.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Parsing and Segmentation Results
Gold
Pipeline
Lattice
Lattice+LM
100.0
97.7
94.1
96.3
Segmentation Accuracy (Dev set)
Comparable segmentation without the effort!27 / 32
![Page 48: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/48.jpg)
Conclusion
English
Chinese
Bulgarian
German
French
Italian
Arabic
Arabic – This Paper
90.1
83.7
81.6
80.1
77.9
75.6
75.8
81.1
Evalb F1 - All Sentence Lengths
28 / 32
![Page 49: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/49.jpg)
����� ��� ���
Thanks.
http://nlp.stanford.edu/projects/arabic.shtml
![Page 50: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/50.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Mixture of Manual and Automatic Annotation
Basic
Annotated
82.4
82.2
80.9
80.8
up to 70
All
Evalb F1 (Dev)
30 / 32
![Page 51: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/51.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Frequency Matched Strata
Arabic Englishµ freq. % nuclei µ freq. % nuclei
2 2.00 46% 2.00 34%[ 3, 4 ] 3.37 26% 3.37 27%[ 5, 9 ] 6.43 16% 6.49 20%[ 10, 49 ] 18.6 10% 19.0 16%[ 50, 500 ] 110.3 2% 113 3%
31 / 32
![Page 52: Baselines, Evaluations, and Analysis - Spence Green](https://reader034.fdocuments.us/reader034/viewer/2022042802/6268366f8d0d987dec3eb6c1/html5/thumbnails/52.jpg)
MotivationSyntax and AnnotationGrammar Development
Experiments
Gold SegmentationRaw Text
Annotation Consistency Evaluation (Again!)
Nuclei Sample Error %per tree n–grams Type n–gram
WSJ 2–21 0.565 750 16.0% 4.10%ATB 0.830 658 26.0% 4.10%
95% confidence intervals (type–level):
I Arabic: [ 17.4%, 34.6% ]
I English: [ 8.79%, 23.3% ]
32 / 32