"Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain
-
date post
20-Oct-2014 -
Category
Marketing
-
view
598 -
download
0
description
Transcript of "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain
www.cle.org.pk 1
Computing Support for Pakistani Languages – Challenges and Practice
Sarmad HussainCenter for Language Engineering
Al-Khawarizmi Institute of Computer ScienceUniversity of Engineering and Technology
Lahore
Unlocking Information for Human Developmentwww.CLE.org.pk
www.cle.org.pk
NeedICTs promise significant socio-economic impact
Impact dependent on size of population which can use ICTs
180 Million citizens need access66+ languages
10% understand English58% literate
11% have access to computers70% have access to mobile phones
ITU IDI: Pakistan ranked 127 of 155 nations
Human Language Technology necessary to bridge the gap 2
www.cle.org.pk
Languages of Pakistan
Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)
Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66
Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53
Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93
3
Percent Population of Pakistan by
Mother Tongue
www.cle.org.pk
Languages of Pakistan
Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)
Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66
Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53
Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93
4
Percent Population of Pakistan by
Mother Tongue
Economic
Socio-cultural
www.cle.org.pk
Languages of Pakistan
Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)
Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66
Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53
Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93
Languages of Pakistan in Danger (UNESCO)
Vulnerable
definitely endangered
severely endangered 5
Percent Population of Pakistan by
Mother Tongue
Economic
Socio-cultural
www.cle.org.pk 6
How?
USE
Human Language Technology Linguistic Research
StandardsApplicationsMaterials
Training
Relevant Content AccessRelevant Content Generation
Adoption
www.cle.org.pk 7
Human Language Technology – Bridging Barriers
• Interfacing• Assisting• Enabling• Empowering
www.cle.org.pk 8
و سخرالشمس والقمر
Interfacing– Character Set
• Input Methods• Writing• Collation
– Terminology Translation
Language
Technology– Applications
• Fonts• Keyboards, Keypads and
Other Input Methods• Collation Methods• Localized Platform
Standards– National– International
• ISO 639• ISO 3166• ISO 10646/Unicode
– Platforms: Computers and Phones• Linux/Unix and Symbian• Microsoft Windows and Phone• iOS – iPAD, iPhone, Macbook, …• Google – Gmail, Docs, …Android
Software Localization
SeaMonkey Navigator
OpenOffice.org Writer
www.cle.org.pk 10
Terminology and Content
www.cle.org.pk 11
Assisting
• Text– Assistive input/auto-complete methods– Thesaurus, Spelling and Grammar Checking– Machine Translation, Language Identification, Text Summarization …
• Speech– Speech Recognition– Text to Speech– Emotion Detection, …
• Image – Optical Character Recognition – www.UrduOCR.net – Handwriting Recognition
www.cle.org.pk 12
www.cle.org.pk 13
www.cle.org.pk 14
Enabling
• Hybrid– Online Content Sharing Tools – CMS, Social
Networks– Screen Readers– Book Readers– Text based Search Engines– Dialogue Systems– Speech to Speech Translation– Multi-modal Search Engines
www.cle.org.pk 15
Dialogue System
www.cle.org.pk 16
Empowering
• ICT for ICT - Focused on infrastructure• ICT for Development - Focused on content and applications• ICT for Human Development - Focused on participatory process
www.cle.org.pk 17
www.cle.org.pk 18
LANGUAGE AND ICT TRAINING
Before Training After Training Before Training After TrainingSoftware Training Material
0%
20%
40%
60%
80%
100%Preference for Urdu
Preference for English
Before Training After Training Before Training After TrainingSoftware Training Material
0
20
40
60
80
100Preference for Urdu
Preference for EnglishPe
rcen
t Te
ache
rs
www.cle.org.pk 19
LANGUAGE AND ICT TRAINING
Icons
Icon Identification by Students
Urdu English
English Transliterated
into Urdu
Didn't Recogni
ze
Sub-Total F M F M F M F M
Sub- Total
691
656
132
198
150
183 49 40 2099
Total 1347 330 333 89 2099
64%16%
16%
4%
www.cle.org.pk 20
ACCESSING INFO ONLINE
Students
Language Used
TotalUrdu
English
Female 44 2 46Male 45 2 47Total 89 4 93
Participant
English Urdu
Students
0 138
Teachers
5 13
Total 5 151
Preferred Language for Setting a Homepage
Language Preference for Searching on the Internet
www.cle.org.pk 21
LANGUAGE IN ONLINE COMMUNICATION
89%
9%1% 2%
Urdu
English
Punjabi
Others
1467 emails and 363 chats
www.cle.org.pk 22
LANGUAGE FOR CONTENT DEVELOPMENT
Website Competition CategoryLanguage of Website
Urdu English Total
School Website (by 10 School Teacher Teams)
9 1 10
Local Village Website (by 10 School Student Teams)
8 0 8
Open Category (Individual Students) 38 0 38
Total 55 1 56
[1] One school did not participate, and one school website was disqualified as the team took significant external assistance.
www.cle.org.pk 23
CONTENT
Development Process of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
Select Language
24
Status of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
URDU
Reasonable Support
Some Support
Minimal Support
25
Status of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
SINDHI
Reasonable Support
Some Support
Minimal Support
26
Status of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
PUSHTO
Reasonable Support
Some Support
Minimal Support
27
Status of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
PUNJABI
Reasonable Support
Some Support
Minimal Support
28
Status of Human Language Technology
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
BALOCHI
Reasonable Support
Some Support
Minimal Support
29
Status of Human Language Technology
30
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
SARAIKI
Reasonable Support
Some Support
Minimal Support
Status of Human Language Technology
31
Core Linguistic Analysis and
Definition
Detailed Linguistic Analysis
Development of Localization
Utilities
Linguistic Data Collection
Annotation of Linguistic Data
Localization of Existing
Applications
Development of Linguistic
Utilities
Extension of Localization Applications
Development of Advanced
HLT Application
Publishing Language Computing Standards
Publishing Data
Annotations Schema
Publishing Annotated Linguistic
Resources
OTHERS
Reasonable Support
Some Support
Minimal Support
www.cle.org.pk 32