UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu...
-
Upload
nguyenkhanh -
Category
Documents
-
view
223 -
download
1
Transcript of UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu...
![Page 1: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/1.jpg)
UrduGram: Towards a Deep, Large-Coverage Grammarfor Urdu and Hindi
Tafseer Ahmed, Tina Bogel, Miriam Butt, Annette Hautli, GhulamRaza, Sebastian Sulger and Veronika Walther
Universitat Konstanz
FB Kolloquium, May 2010
1 / 60
![Page 2: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/2.jpg)
Preview
1 Urdu & the UrduGram Project
2 Urdu Transliterator
3 Syntax
4 Semantics
2 / 60
![Page 3: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/3.jpg)
Urdu & the UrduGram Project
Urdu
Urdu is
3 / 60
![Page 4: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/4.jpg)
Urdu & the UrduGram Project
Urdu
Urdu is
a South Asian language spoken primarily in Pakistan and India
3 / 60
![Page 5: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/5.jpg)
Urdu & the UrduGram Project
Urdu
Urdu is
a South Asian language spoken primarily in Pakistan and Indiadescended from (a version of) Sanskrit (sister language of Latin)
3 / 60
![Page 6: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/6.jpg)
Urdu & the UrduGram Project
Urdu
Urdu is
a South Asian language spoken primarily in Pakistan and Indiadescended from (a version of) Sanskrit (sister language of Latin)structurally identical to Hindi (spoken mainly in India)
3 / 60
![Page 7: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/7.jpg)
Urdu & the UrduGram Project
Urdu
Urdu is
a South Asian language spoken primarily in Pakistan and Indiadescended from (a version of) Sanskrit (sister language of Latin)structurally identical to Hindi (spoken mainly in India)together with Hindi the fourth most spoken language in the world(∼ 250 million native speakers)
3 / 60
![Page 8: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/8.jpg)
Urdu & the UrduGram Project
Urdu and Hindi
The two languages are regarded as structurally identical:
4 / 60
![Page 9: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/9.jpg)
Urdu & the UrduGram Project
Urdu and Hindi
The two languages are regarded as structurally identical:
syntax/morphology are practically identical
4 / 60
![Page 10: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/10.jpg)
Urdu & the UrduGram Project
Urdu and Hindi
The two languages are regarded as structurally identical:
syntax/morphology are practically identical
vocabulary is practically identical (Urdu: borrowed fromPersian/Arabic; Hindi: borrowed from Sanskrit)
4 / 60
![Page 11: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/11.jpg)
Urdu & the UrduGram Project
Urdu and Hindi
The two languages are regarded as structurally identical:
syntax/morphology are practically identical
vocabulary is practically identical (Urdu: borrowed fromPersian/Arabic; Hindi: borrowed from Sanskrit)
main difference is in the script
4 / 60
![Page 12: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/12.jpg)
Urdu & the UrduGram Project
Urdu and Hindi
The two languages are regarded as structurally identical:
syntax/morphology are practically identical
vocabulary is practically identical (Urdu: borrowed fromPersian/Arabic; Hindi: borrowed from Sanskrit)
main difference is in the script
→ We are developing a single grammar and lexicon for both of thelanguages!
4 / 60
![Page 13: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/13.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
5 / 60
![Page 14: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/14.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
5 / 60
![Page 15: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/15.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
5 / 60
![Page 16: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/16.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
Collaborative, world-wide research project
5 / 60
![Page 17: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/17.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
Collaborative, world-wide research projectDevoted to developing parallel LFG grammars for a variety of languages
5 / 60
![Page 18: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/18.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
Collaborative, world-wide research projectDevoted to developing parallel LFG grammars for a variety of languagesFeatures and analyses are kept parallel for easy transfer betweenlanguages
5 / 60
![Page 19: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/19.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
Collaborative, world-wide research projectDevoted to developing parallel LFG grammars for a variety of languagesFeatures and analyses are kept parallel for easy transfer betweenlanguagesLanguages involved:
5 / 60
![Page 20: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/20.jpg)
Urdu & the UrduGram Project
Context of Work
Computational LFG grammar in development in Konstanz
Aim: large-scale LFG grammar for parsing Urdu/Hindi
Grammar is part of the ParGram project
Collaborative, world-wide research projectDevoted to developing parallel LFG grammars for a variety of languagesFeatures and analyses are kept parallel for easy transfer betweenlanguagesLanguages involved:
→ English, German, French, Japanese, Norwegian, Welsh, Georgian,Hungarian, Turkish, Chinese, Indonesian, Urdu (among many others)
5 / 60
![Page 21: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/21.jpg)
Urdu & the UrduGram Project
The ParGram Grammar Architecture
6 / 60
![Page 22: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/22.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram
Analysis for transitive sentence in English ParGram grammar(F-Structure, “Functional Structure”):
7 / 60
![Page 23: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/23.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram
Analysis for transitive sentence in English ParGram grammar(F-Structure, “Functional Structure”):
"Nadya saw the book."
'see<[1:Nadya], [113:book]>'PRED
'Nadya'PRED
_LEX-SOURCE morphology, _PROPER known-nameCHECK
NAME-TYPE first_name, PROPER-TYPE namePROPERNSEM
properNSYNNTYPE
CASE nom, GEND-SEM female, HUMAN +, NUM sg, PERS 31
SUBJ
'book'PRED
countnoun-lex_LEX-SOURCECHECK
countCOMMONNSEM
commonNSYNNTYPE
'the'PREDdefDET-TYPE
DETSPEC
CASE obl, NUM sg, PERS 3113
OBJ
V-SUBJ-OBJ_SUBCAT-FRAMECHECK
MOOD indicative, PERF - _, PROG - _, TENSE pastTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main57
7 / 60
![Page 24: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/24.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram (cont.)
Analysis for the same transitive sentence in Urdu ParGram grammar(F-Structure, “Functional Structure”):
8 / 60
![Page 25: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/25.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram (cont.)
Analysis for the same transitive sentence in Urdu ParGram grammar(F-Structure, “Functional Structure”):
"nAdiyah nE kitAb dEkHI"
'dEkH<[1:nAdiyah], [20:kitAb]>'PRED
'nAdiyah'PRED
obl_NMORPHCHECK
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE erg, GEND fem, NUM sg, PERS 31
SUBJ
'kitAb'PRED
countCOMMONNSEM
commonNSYNNTYPE
CASE nom, GEND fem, NUM sg, PERS 320
OBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perfCHECK
+AGENTIVELEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main42
8 / 60
![Page 26: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/26.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram (cont.)
Analysis for the same transitive sentence in Urdu ParGram grammar(F-Structure, “Functional Structure”):
"nAdiyah nE kitAb dEkHI"
'dEkH<[1:nAdiyah], [20:kitAb]>'PRED
'nAdiyah'PRED
obl_NMORPHCHECK
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE erg, GEND fem, NUM sg, PERS 31
SUBJ
'kitAb'PRED
countCOMMONNSEM
commonNSYNNTYPE
CASE nom, GEND fem, NUM sg, PERS 320
OBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perfCHECK
+AGENTIVELEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main42
→ Analyses are kept parallel where possible
8 / 60
![Page 27: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/27.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram (cont.)
Analysis for the same transitive sentence in Urdu ParGram grammar(F-Structure, “Functional Structure”):
"nAdiyah nE kitAb dEkHI"
'dEkH<[1:nAdiyah], [20:kitAb]>'PRED
'nAdiyah'PRED
obl_NMORPHCHECK
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE erg, GEND fem, NUM sg, PERS 31
SUBJ
'kitAb'PRED
countCOMMONNSEM
commonNSYNNTYPE
CASE nom, GEND fem, NUM sg, PERS 320
OBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perfCHECK
+AGENTIVELEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main42
→ Analyses are kept parallel where possible
→ Features are kept parallel where possible
8 / 60
![Page 28: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/28.jpg)
Urdu & the UrduGram Project
The ‘Parallel’ in ParGram (cont.)
Demo: Large-Scale English ParGram Grammar
9 / 60
![Page 29: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/29.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
10 / 60
![Page 30: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/30.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
The ParGram project is working on Deep Grammars
10 / 60
![Page 31: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/31.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
The ParGram project is working on Deep Grammars
Provide detailed syntactic and semantic analyses
10 / 60
![Page 32: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/32.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
The ParGram project is working on Deep Grammars
Provide detailed syntactic and semantic analysesEncode grammatical functions, tense, number etc.
10 / 60
![Page 33: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/33.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
The ParGram project is working on Deep Grammars
Provide detailed syntactic and semantic analysesEncode grammatical functions, tense, number etc.Linguistically motivated
10 / 60
![Page 34: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/34.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
The Motivation behind ParGram
The ParGram project is working on Deep Grammars
Provide detailed syntactic and semantic analysesEncode grammatical functions, tense, number etc.Linguistically motivatedUsually manually constructed (→ linguistic intuition)
10 / 60
![Page 35: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/35.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
11 / 60
![Page 36: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/36.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
11 / 60
![Page 37: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/37.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
11 / 60
![Page 38: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/38.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
11 / 60
![Page 39: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/39.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
Question-Answering
11 / 60
![Page 40: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/40.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
Question-Answering
Knowledge Representation
11 / 60
![Page 41: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/41.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
Question-Answering
Knowledge Representation
Text Summarization
11 / 60
![Page 42: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/42.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
Question-Answering
Knowledge Representation
Text SummarizationMachine Translation
11 / 60
![Page 43: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/43.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
Possible Applications
Large-Coverage, Deep Computational Grammars can be useful for:
Meaning-Sensitive Applications
Web-Search
Question-Answering
Knowledge Representation
Text SummarizationMachine TranslationComputer-Assisted Language Learning
11 / 60
![Page 44: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/44.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
12 / 60
![Page 45: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/45.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
12 / 60
![Page 46: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/46.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
Uses large-scale English LFG
12 / 60
![Page 47: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/47.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
Uses large-scale English LFG
Works on English Wikipedia
12 / 60
![Page 48: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/48.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
Uses large-scale English LFG
Works on English Wikipedia
Parses query and matches withparsed corpus
12 / 60
![Page 49: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/49.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
Uses large-scale English LFG
Works on English Wikipedia
Parses query and matches withparsed corpus
→ Can give better results than
regular search engines
12 / 60
![Page 50: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/50.jpg)
Urdu & the UrduGram Project
Computational Grammars - What For?
powerset.com
“Semantic search engine”
Uses large-scale English LFG
Works on English Wikipedia
Parses query and matches withparsed corpus
→ Can give better results than
regular search engines
(Example: ‘X was bought by Y’vs. ‘Y acquired X’)
12 / 60
![Page 51: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/51.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
13 / 60
![Page 52: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/52.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
13 / 60
![Page 53: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/53.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
↓transliterator (Urdu & Hindi to Roman script)
13 / 60
![Page 54: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/54.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
↓transliterator (Urdu & Hindi to Roman script)
↓morphology (fst)
13 / 60
![Page 55: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/55.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
↓transliterator (Urdu & Hindi to Roman script)
↓morphology (fst)
↓syntax (c- and f-structure) (xle)
13 / 60
![Page 56: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/56.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
↓transliterator (Urdu & Hindi to Roman script)
↓morphology (fst)
↓syntax (c- and f-structure) (xle)
↓semantics (xfr ordered rewriting)
13 / 60
![Page 57: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/57.jpg)
Urdu & the UrduGram Project
Our Overall Architecture
Our parsing architecture currently looks like this:
tokenizer
↓transliterator (Urdu & Hindi to Roman script)
↓morphology (fst)
↓syntax (c- and f-structure) (xle)
↓semantics (xfr ordered rewriting)
xle is the overall development platform, with the other modules(fst and xfr) being plugged into it.
13 / 60
![Page 58: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/58.jpg)
Urdu & the UrduGram Project
Overview
Overall Architecture
tokenizer↓
transliterator (Urdu & Hindi to Roman script)↓
morphology (fst)↓
syntax (c- and f-structure) (xle)↓
semantics (xfr ordered rewriting)
14 / 60
![Page 59: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/59.jpg)
Urdu Transliterator
Aim of the transliterator
Our aim is to build and integrate a transliterator that allows for both,Urdu and Hindi, to be parsed and generated with the same grammar.
couplet by the poet Mirza Ghalib
Urdu Hindi
Romanized Script
(the XLE grammar)
→ Right now we are working on the Urdu-Roman transliterator.
15 / 60
![Page 60: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/60.jpg)
Urdu Transliterator
Transliteration scheme
An excerpt from our scheme table:
Unicode Urdu character Latin letter Phonemein transliteration scheme
H. b /b/
H� p /p/�H t /t/�H T /ú/
À^ j /j/ h^ c /
>Ù/
16 / 60
![Page 61: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/61.jpg)
Urdu Transliterator
Basic idea of the transliterator
use finite state transducer to allow for generation and parsing.
17 / 60
![Page 62: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/62.jpg)
Urdu Transliterator
Basic idea of the transliterator
use finite state transducer to allow for generation and parsing.
Urdu script:parsing ↓ —————–———— ↑ generating
ASCII: bA
AK.
17 / 60
![Page 63: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/63.jpg)
Urdu Transliterator
Basic idea of the transliterator
use finite state transducer to allow for generation and parsing.
Urdu script:parsing ↓ —————–———— ↑ generating
ASCII: bA
AK.
The same concept will be used to create a transliterator forHindi/Devanagari
17 / 60
![Page 64: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/64.jpg)
Urdu Transliterator
Basic idea of the transliterator
use finite state transducer to allow for generation and parsing.
Urdu script:parsing ↓ —————–———— ↑ generating
ASCII: bA
AK.
The same concept will be used to create a transliterator forHindi/Devanagari
This way we can parse Urdu script and generate Hindi script(and vice versa)
17 / 60
![Page 65: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/65.jpg)
Urdu Transliterator
Position of the transliterator
the transliterator is composed with the tokenizer(separates the words within a sentence)
18 / 60
![Page 66: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/66.jpg)
Urdu Transliterator
Position of the transliterator
the transliterator is composed with the tokenizer(separates the words within a sentence)
tokenizer and transliterator are placed in front of the morphology
18 / 60
![Page 67: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/67.jpg)
Urdu Transliterator
Position of the transliterator
the transliterator is composed with the tokenizer(separates the words within a sentence)
tokenizer and transliterator are placed in front of the morphology
InputTransliterator ↓ ↓
Output kitAb
Input kitAbMorphology ↓ ↓
Output kitAb+Noun+Fem+Sg+Count
XLE ... ...
H. A�J»
18 / 60
![Page 68: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/68.jpg)
Urdu Transliterator
Example
→ The transliterator at this position works quite well:
(1) laRkE
boy
kI
gen
kitAb
book
‘The boy’s book’
→ Problem: long sentences or highly ambiguous words (when looking atscript) need some time to parse.
19 / 60
![Page 69: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/69.jpg)
Urdu Transliterator
Problems of the script - an example
The problem of the vowels ...
Diacritics represent short vowels
Urdu script Roman script
ba
bi
bu
�H.H.��H.
20 / 60
![Page 70: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/70.jpg)
Urdu Transliterator
Problems of the script - an example
The problem of the vowels ...
Diacritics represent short vowels
Urdu script Roman script
ba
bi
bu
�H.H.��H.
(2) nAdyA
Nadya
nE
erg
yasIn
Yasin
kO
dat
kitAb
see
dEkHnE
let
dI
‘Nadya let Yassin see the book’
ø X� úG� éºKX� H. A
��J»� ñ�» á���
�� úG� A�KXA
�K
20 / 60
![Page 71: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/71.jpg)
Urdu Transliterator
Problems of the script - an example
The problem of the vowels ...
Diacritics represent short vowels
Urdu script Roman script
ba
bi
bu
�H.H.��H.
(2) nAdyA
Nadya
nE
erg
yasIn
Yasin
kO
dat
kitAb
see
dEkHnE
let
dI
‘Nadya let Yassin see the book’
ø X� úG� éºKX� H. A
��J»� ñ�» á���
�� úG� A�KXA
�K
Unfortunately, these diacritics tend to be left out.
ø X úG éºKX H. A�J» ñ» á��� ú
G AKXA K
20 / 60
![Page 72: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/72.jpg)
Urdu Transliterator
Consequences
If the input is without diacritics, e.g. ...
Urdu script letter combination representation translation
ktAb kitAb ‘book’H. A�J»
H. A�J»
21 / 60
![Page 73: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/73.jpg)
Urdu Transliterator
Consequences
If the input is without diacritics, e.g. ...
Urdu script letter combination representation translation
ktAb kitAb ‘book’H. A�J»
H. A�J»
.. then there are all kinds of possible combinations:kitAb, kutaAb, kitAbu, ikatAubi, ukitAbia, akatAbu, aukatAib ....
21 / 60
![Page 74: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/74.jpg)
Urdu Transliterator
Consequences
If the input is without diacritics, e.g. ...
Urdu script letter combination representation translation
ktAb kitAb ‘book’H. A�J»
H. A�J»
.. then there are all kinds of possible combinations:kitAb, kutaAb, kitAbu, ikatAubi, ukitAbia, akatAbu, aukatAib ....
(demo)
21 / 60
![Page 75: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/75.jpg)
Urdu Transliterator
Solution
In order to restrict this overgeneration the possible letter combinationsneed to be constrained:
22 / 60
![Page 76: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/76.jpg)
Urdu Transliterator
Solution
In order to restrict this overgeneration the possible letter combinationsneed to be constrained:
which vowels are actually allowed to cooccur?
→ ai, but not ia?
22 / 60
![Page 77: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/77.jpg)
Urdu Transliterator
Solution
In order to restrict this overgeneration the possible letter combinationsneed to be constrained:
which vowels are actually allowed to cooccur?
→ ai, but not ia?
which consonants are actually allowed to cooccur?
→ initial kr, but not gr?
22 / 60
![Page 78: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/78.jpg)
Urdu Transliterator
Solution
In order to restrict this overgeneration the possible letter combinationsneed to be constrained:
which vowels are actually allowed to cooccur?
→ ai, but not ia?
which consonants are actually allowed to cooccur?
→ initial kr, but not gr?
certain combinations with semi-vowels or consonants are not allowed:
→ a short vowel followed by v may not be followed by u or i
22 / 60
![Page 79: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/79.jpg)
Urdu Transliterator
Solution
In order to restrict this overgeneration the possible letter combinationsneed to be constrained:
which vowels are actually allowed to cooccur?
→ ai, but not ia?
which consonants are actually allowed to cooccur?
→ initial kr, but not gr?
certain combinations with semi-vowels or consonants are not allowed:
→ a short vowel followed by v may not be followed by u or i
certain positions are prohibited:
→ a word can never end in a short vowel or begin with a short vowelthat is only represented with a diacritic
22 / 60
![Page 80: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/80.jpg)
Urdu Transliterator
Solution
write rules and filters out of these constraints and apply them to thetransliterator
(demo)
23 / 60
![Page 81: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/81.jpg)
Urdu Transliterator
Solution
write rules and filters out of these constraints and apply them to thetransliterator
(demo)
Problem: these “rules” cannot be found in the literature - they are aproduct of extensive manual labor
23 / 60
![Page 82: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/82.jpg)
Urdu Transliterator
Solution
write rules and filters out of these constraints and apply them to thetransliterator
(demo)
Problem: these “rules” cannot be found in the literature - they are aproduct of extensive manual labor
However, the transliterator works quite well now
→ Some sentences are still a little slow (but I keep looking for possiblerestrictions)
→ continue with generation of Urdu and the Hindi transliterator
23 / 60
![Page 83: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/83.jpg)
Urdu Transliterator
Overview
Overall Architecture
tokenizer↓
transliterator (Urdu & Hindi to Roman script)↓
morphology (fst)↓
syntax (c- and f-structure) (xle)↓
semantics (xfr ordered rewriting)
24 / 60
![Page 84: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/84.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
25 / 60
![Page 85: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/85.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
theoretical background: LFG
25 / 60
![Page 86: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/86.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
theoretical background: LFG
well-studied (∼ 30 years) framework with computational usability
25 / 60
![Page 87: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/87.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
theoretical background: LFG
well-studied (∼ 30 years) framework with computational usability
c- and f-structures used for syntactic representation
25 / 60
![Page 88: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/88.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
theoretical background: LFG
well-studied (∼ 30 years) framework with computational usability
c- and f-structures used for syntactic representation
c-structure: basic constituent structure (“tree”) and linear precedence(∼ what parts belong together)
25 / 60
![Page 89: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/89.jpg)
Syntax
Syntax
syntax component is at the core of Urdu grammar
theoretical background: LFG
well-studied (∼ 30 years) framework with computational usability
c- and f-structures used for syntactic representation
c-structure: basic constituent structure (“tree”) and linear precedence(∼ what parts belong together)f-structure: encodes syntactic functions and properties
25 / 60
![Page 90: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/90.jpg)
Syntax
Syntax
CS 1: ROOT
S
KP
NP
N
nAdiyah
VCmain
V
hansI
"nAdiyah hansI"
'hans<[1:nAdiyah]>'PRED
'nAdiyah'PRED
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE nom, GEND fem, NUM sg, PERS 31
SUBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perfCHECK
unergVERB-CLASSLEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main18
26 / 60
![Page 91: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/91.jpg)
Syntax
Syntax
CS 1: ROOT
S
KP
NP
N
nAdiyah
VCmain
V
hansI
"nAdiyah hansI"
'hans<[1:nAdiyah]>'PRED
'nAdiyah'PRED
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE nom, GEND fem, NUM sg, PERS 31
SUBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perfCHECK
unergVERB-CLASSLEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main18
current size: 53 phrase-structure rules, annotated for syntacticfunction (usual size of large-scale grammars: 350–400 rules)
26 / 60
![Page 92: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/92.jpg)
Syntax
Syntax
CS 1: ROOT
S
KP
NP
N
nAdiyah
VCmain
V
hansI
"nAdiyah hansI"
'hans<[1:nAdiyah]>'PRED
'nAdiyah'PRED
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE nom, GEND fem, NUM sg, PERS 31
SUBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perfCHECK
unergVERB-CLASSLEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main18
current size: 53 phrase-structure rules, annotated for syntacticfunction (usual size of large-scale grammars: 350–400 rules)
coverage: basic clauses with free word order, NP syntax, tense andaspect, causative verbs, complex predicates, relative clauses, passives,semantically-based case marking
26 / 60
![Page 93: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/93.jpg)
Syntax
Discontinuous NPs in Urdu
1 Well known discontinuities
2 NP-internal discontinuity in Urdu
3 LFG implementation
4 Conclusion
27 / 60
![Page 94: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/94.jpg)
Syntax
Extraction from DP
(2) a.Er hat viele Bucher uber Logik gekauft.He has many books on logic bought‘He has bought many books about logic.’
b. Bucher uber Logik hat er viele gekauft.
c. Uber Logik hat er viele Bucher gekauft. (German)
(3) mantiq=par nidA=nE Ek kitAblogic=Loc.on Nida=Erg one book.F.3Sg
xarId-I he.buy-Perf be.Pres
‘Nida has purchased a book on logic.’ (Urdu)
28 / 60
![Page 95: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/95.jpg)
Syntax
Quantifier Float
(4) a. They all have bought a car.
b. They have all bought a car.
(5)Am alI=nE bahut kHA-Emango.Pl Ali=Erg many eat-Perf‘Ali ate many mangoes.’ (Urdu)
29 / 60
![Page 96: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/96.jpg)
Syntax
Constituent-level discontinuities in Urdu
NP-internal discontinuity
Discontinuous NP
Discontinuous AP
30 / 60
![Page 97: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/97.jpg)
Syntax
When NP-internal discontinuity occurs in Urdu
The NP-internal discontinuity in Urdu can occur when theargument-taking noun is modified by:
argument-taking adjectives
argument-taking specifier nouns
31 / 60
![Page 98: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/98.jpg)
Syntax
Argument-taking adjectives in Urdu
Nr. Type of Argument Example of Adjective Phrase
(i) Dative Marked sadr=kO hAsilpresident=Dat possessed‘possessed by the president’
(ii) Ablative Marked adliyah=sE xAifcourts=Abl afraid‘afraid of courts’
(iii) Locative Marked buxAr=mEN mubtalAfever=Loc.in suffered‘suffered with fever’
(iv) Adpositional sihat=kE liyE muzirhealth=Gen for harmful‘harmful for health’
32 / 60
![Page 99: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/99.jpg)
Syntax
Simple examples of argument-taking nouns
(6) a. istisnA‘immunity’
b.muqaddamAt=sE istisnAcourt-case.Pl=Abl immunity‘immunity from court-cases’
c.muqaddamAt=sE AInI istisnAcourt-case.Pl=Abl constitutional immunity‘constitutional immunity from court-cases’
33 / 60
![Page 100: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/100.jpg)
Syntax
Simple examples of argument-taking nouns
(7) a. barIfiNg‘briefing’
b.salAmtI=par barIfiNgsecurity=Loc briefing‘briefing on security’
c.salAmtI=par tafsIlI barIfiNgsecurity=Loc detailed briefing‘detailed briefing on security’
34 / 60
![Page 101: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/101.jpg)
Syntax
Simple examples of argument-taking nouns
(8) a. mutAlbA‘demand’
b.ArmI-cIf=sE mutAlbAarmy-chief=Abl demand‘demand to the army-chief’
c.ArmI-cIf=sE qAnUnI mutAlbAarmy-chief=Abl legal demand‘legal demand to the army-chief’
35 / 60
![Page 102: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/102.jpg)
Syntax
Examples of discontinuous NPs
(9)a1. sadr=kO1 hAsil1 muqaddamAt=sE2
president=Dat possessed court-cases=Abl
AInI istisnA2
constitutional immunity
‘Constitutional Immunity from court-cases possessedby the president’
a2. [NP [AP [KP sadr=kO] hAsil][KP muqaddamAt=sE] AInI istisnA]
b. muqaddamAt=sE2 sadr=kO1 hAsil1 AInI istisnA2
c. sadr=kO1 muqaddamAt=sE2 hAsil1 AInI istisnA2
d. *hAsil1 muqaddamAt=sE2 sadr=kO1 AInI istisnA2
36 / 60
![Page 103: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/103.jpg)
Syntax
Hierarchical structure of AP in NP
CS 1: NP
KP
NP
N
muqaddamAt
K
sE
AP
KP
NP
N
s3adr
K
kO
A
h2As3il
AP
A
AInI
N
istis2nA
Figure: Hierarchical structure of AP in NP
37 / 60
![Page 104: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/104.jpg)
Syntax
Examples of discontinuous NPs
(10)a1.ArmI-cIf=sE2 salAmtI=par1 barIfiNg1=kA mutAlbA2
army-chief=Abl security=Loc.on briefing=Gen demand‘The demand to the army chief for briefing on security’
a2. [NP [KP ArmI-cIf=sE][KP [NP [KP salAmtI=par] barIfiNg]=kA]mutAlbA]
b. salAmtI=par1 ArmI-cIf=sE2 barIfiNg1=kA mutAlbA2
38 / 60
![Page 105: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/105.jpg)
Syntax
Examples of discontinuous NPs
(11) [NP [KP ArmI-cIf=sE] [KP [NP [KP mulkI salAmtI=par]army-chief=Abl of-country security=Loc.on
tafsIlI barIfiNg]=kA] qAnUnI mutAlbA]detailed briefing=Gen legal demand
‘The legal demand to the army chief for a detailedbriefing on security of the country’
39 / 60
![Page 106: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/106.jpg)
Syntax
LFG implementation of NP-internal discontinuity
NP
KP/PP A+ A NSpec(N)/Arg(N/A) Arg-taking-adj Arg-less-adj Head-noun
Scrambling of elements in oval possible with some constraints
Figure: Word Order in Noun Phrases of Urdu
40 / 60
![Page 107: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/107.jpg)
Syntax
Implementation Issues
Free word order in an NP
Relating arguments with corresponding heads
Head last constraint
41 / 60
![Page 108: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/108.jpg)
Syntax
LFG instruments used
Shuffle operator (‘,’):To accommodate free word order of different elements in the nounphrases.
Non-deterministic operator (‘$’):Relating the corresponding arguments to the corresponding heads.
Head Precedence Operator (‘<h’):To make it sure that the head must not precede its arguments in thenoun phrases.
42 / 60
![Page 109: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/109.jpg)
Syntax
An excerpt from Grammar Rules
NP�
KP*: { (^ ADJUNCT $ OBL)= !| (^ ADJUNCT $ OBJ- GO)= ! | (^ OBL) = ! | (^ OBJ-GO) = ! }, “for scrambling”
AP*: ! $ (^ ADJUNCT ) N : ^ = !
__________________________________________
KP*: { (^ ADJUNCT $ OBL)= !(^ ADJUNCT) <h (^ ADJUNCT $ OBL)
| ..... }.......
Figure: Grammar Rules
43 / 60
![Page 110: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/110.jpg)
Syntax
C-structure for a discontinuous NP
CS 1: NP
KP
NP
N
s3adr
K
kO
KP
NP
N
muqaddamAt
K
sE
AP
A
h2As3il
AP
A
AInI
N
istis2nA
Figure: C-structure
44 / 60
![Page 111: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/111.jpg)
Syntax
F-structure for a discontinuous NP
"s3adr kO muqaddamAt sE h2As3il AInI istis2nA"
'istis2nA<[34:muqaddamah]>'PRED
'muqaddamah'PRED
obl_NMORPHCHECK
CASE inst, GEND masc, NUM pl34
OBL
'h2As2il<[1:s3adr]>'PRED
's3adr'PRED
obl_NMORPHCHECK
countCOMMONNSEM
commonNSYNNTYPE
CASE dat, GEND masc, NUM sg, PERS 31
OBJ-GO
-_RESTRICTEDCHECK
+GOALLEX-SEM
attributiveATYPE39
'AInI'PREDattributiveATYPE
[39:h2As2il]<s44
ADJUNCT
49
Figure: F-structure
45 / 60
![Page 112: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/112.jpg)
Syntax
Summary
Urdu is a typical language in which discontinuous NPs are found both at:
Clause-level
Constituent-level
Constituent-level discontinuity in Urdu can be implemented in LFGframework by making use of:
Shuffle operator (‘,’)
Non-deterministic operator (‘$’)
Head-precedence operator (‘<h’)
46 / 60
![Page 113: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/113.jpg)
Syntax
Overview
Overall Architecture
tokenizer↓
transliterator (Urdu & Hindi to Roman script)↓
morphology (fst)↓
syntax (c- and f-structure) (xle)↓
semantics (xfr ordered rewriting)
47 / 60
![Page 114: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/114.jpg)
Semantics
Intro
Aim: a large-coverage computational semantic analyzer on the basis of adeep syntactic analysis
use f-structures as starting point
apply xfr semantic rules → from f-structure facts to a semanticrepresentation (Crouch and King, 2006)
judgment on the semantic well-formedness of a sentence
The girl laughs. → semantically well-formed#The tree laughs. → semantically ill-formed
we need lexical information about the words in a sentence
1 lexical resource for Urdu verbs
more information on the verb and its arguments
2 general lexical resource for Urdu nouns, adjectives etc.
48 / 60
![Page 115: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/115.jpg)
Semantics
Intro
F-structure for nAdiyah hansI (Nadya laughed).
"nAdiyah hansI"
'hans<[1:nAdiyah]>'PRED
'nAdiyah'PRED
namePROPER-TYPEPROPERNSEM
properNSYNNTYPE
+SPECIFICSEM-PROP
CASE nom, GEND fem, NUM sg, PERS 31
SUBJ
infl_MTYPE_VMORPH
_RESTRICTED -, _VFORM perfCHECK
unergVERB-CLASSLEX-SEM
ASPECT perf, MOOD indicativeTNS-ASP
CLAUSE-TYPE decl, PASSIVE -, VTYPE main18
xfr semantic rule:PRED(%1, hans), SUBJ(%1, %subj), -OBJ(%1, %obj)
==>
word(%1, hans, verb), role(Agent, %1, %subj).
49 / 60
![Page 116: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/116.jpg)
Semantics
Developing an Urdu VerbNet (1)
following the methodology of the English VerbNet (Kipper-Schuler2006)
categorization of English verbs in 250 classesinformation on event structure and argument structure of verbsprovides the general architecture for a VerbNet in any languagee.g. parts of the entry for ‘laugh’ in the English VerbNet
50 / 60
![Page 117: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/117.jpg)
Semantics
Developing an Urdu VerbNet (2)
Difficulty: resource sparseness of Urdu
Approach 1:
translating the entries in the English VerbNet to Urdu
figure out problematic cases
Approach 2:
fully rely on corpus work
extend tool for automatic subcategorization extraction (Ghulam,2010)
Can we benefit from a Hindi lexical resource?
51 / 60
![Page 118: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/118.jpg)
Semantics
Hindi WordNet
Facts:
inspired in methodology and architecture by the English WordNet(Fellbaum 1998)
52 / 60
![Page 119: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/119.jpg)
Semantics
Hindi WordNet
developed at the Indian Institute of Technology, Bombay, India
separated into four independent “semantic nets”
verbs, nouns, adjectives and adverbs
about 3.900 verbs, 57.000 nouns, 13.700 adjectives and 1.300 adverbs
words are grouped according to their meaning similarity (“synsets”)
53 / 60
![Page 120: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/120.jpg)
Semantics
Hindi WordNet
Issues
far less specific concepts than in the English WordNet
Hindi WordNet:TOP 〉 Noun 〉 Inanimate 〉 Object 〉 Artifact 〉 kitAbTOP 〉 Noun 〉 Inanimate 〉 Object 〉 Artifact 〉 mez
English WordNet:entity 〉 physical entity 〉 object 〉 whole unit 〉 artifact 〉 creation 〉 product〉 piece of work 〉 publication 〉 book
entity 〉 physical entity 〉 object 〉 whole unit 〉 artifact 〉 instrumentatlity 〉furnishing 〉 piece of furniture 〉 table
54 / 60
![Page 121: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/121.jpg)
Semantics
Benefits for an Urdu VerbNet
Preliminary experiments for Urdu/Hindi verbs
Resources that we have:
the database from Hindi WordNeta list of Urdu verbs
out of 3.900 Hindi verbs, we have found 534 verbs in an Urdu verblist (Humayoun, 2006)
complex predicates are included in Hindi WordNet, but not in theUrdu wordlist
total of around 700 Urdu verbs → more than 2/3 of Urdu verbs arefound
all found verbs seem to be valid
→ extract verb information from Hindi WordNet for the Urdu VerbNet
55 / 60
![Page 122: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/122.jpg)
Semantics
Urdu Lexical Semantics
Polysemy:An extreme case - eat expressions in Hindi/Urdu (Hook and Pardeshi,2009):
employing ’eat’ in idiomatic expressions
about 160 eat expressions for Hindi/Urdu
variety of uses due to loan translations from Persian
56 / 60
![Page 123: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/123.jpg)
Semantics
Urdu Lexical Semantics
h2asan=ne kEk=ko kHAyAh2asan.Erg cake.Acc eat.Perf.Sg.Masc’Hasan ate the cake.’
eat=〈 Agent, Theme 〉
inqilAbI fikar zang kHA jAEgIrevolutionary thought rust eat go.Fut’Revolutionary thinking will gather rust.’
eat (gather rust) =〈 Patient, Theme 〉
is sAl=kI mandI sheyar-bAzAr kHA gAyIthis year.Gen slowdown.Fem stockmarket eat go.Fut.Fem’This year’s slowdown wrecked (lit. devoured) the stock market.’
eat (wreck) =〈 Agent, Theme 〉
57 / 60
![Page 124: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/124.jpg)
Semantics
Urdu Lexical Semantics
How do we approach polysemy in the computational semantics?
extensive corpus work to find polysemous verbs
assign different thematic roles to polysemous verbs?
put all combinations in the Urdu VerbNet, but mark the “original”use?
analysis for all sentences, mark idiomatic and semantically ill-formedsentences as such?
58 / 60
![Page 125: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/125.jpg)
Semantics
Wrap up
What we have talked about:
architecture of the Urdu LFG Grammar
ongoing work
transliterationdiscontinuous NPscomputational semantics
challenges ahead
Demo
59 / 60
![Page 126: UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu ...ling.uni-konstanz.de/pages/home/sulger/slides/kolloq-talk-urdu.pdf · UrduGram: Towards a Deep, Large-Coverage Grammar](https://reader030.fdocuments.us/reader030/viewer/2022021503/5a7869717f8b9aa2448b821a/html5/thumbnails/126.jpg)
Semantics
Thank you!
60 / 60