APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE...

32
APPENDIX - A EFFORTS IN PREPARING TAMIL LANGUAGE FOR COMPUTER A.1 Introduction Multilingual sotiware works with two or more languages. Language has to get reformed in order to suit the demands of mult~l~ngual software from the language perspective. The languages can be categorized based on the reformat~onfor digital form as - Fully reformed languages - Partially reformed languages - Minimally or not reformed languages European languages l~ke French, Spanish etc., are ready for the computerization. They fall under the fully reformed languages category. Many of Indian languages like Tamil, Telugu, etc., are standardized in some aspects. Yet many other aspects of those languages are to be standardized for computerization. These k~nds of languages come under partially reformed languages. In contrast to the above, languages of under developed countries are yet to get the~r d~g~tal form. These kinds of languages are grouped under minimally or not reformed languages. Issues related to fully reformed languages category are already met. But, issues are yet to be finalized in the case of minimally or not reformed languages. So, the focus has to be put towards the partially reformed languages and particularly Tamil as an example case. In order to get the dlgital form of Tam~l, it should be made to meet the following requirements like character set, codlng scheme, keyboard layouts and rendering techniques (Ando Peter, 1994). Tamil is one of the most popular anclent languages used by large community in Tamil nadu and all over the world. The Important reason for the "Evergreen",

Transcript of APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE...

Page 1: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

APPENDIX - A

EFFORTS IN PREPARING TAMIL LANGUAGE FOR

COMPUTER

A.1 Introduction

Multilingual sotiware works with two or more languages. Language has to get

reformed in order to suit the demands of mult~l~ngual software from the language

perspective. The languages can be categorized based on the reformat~on for digital

form as

- Fully reformed languages

- Partially reformed languages

- Minimally or not reformed languages

European languages l ~ k e French, Spanish etc., are ready for the

computerization. They fall under the fully reformed languages category. Many of

Indian languages like Tamil, Telugu, etc., are standardized in some aspects. Yet many

other aspects of those languages are to be standardized for computerization. These

k~nds of languages come under partially reformed languages. In contrast to the above,

languages of under developed countries are yet to get the~r d ~ g ~ t a l form. These kinds

of languages are grouped under minimally or not reformed languages.

Issues related to fully reformed languages category are already met. But,

issues are yet to be finalized in the case of minimally or not reformed languages. So,

the focus has to be put towards the partially reformed languages and particularly

Tamil as an example case. In order to get the dlgital form of Tam~l, it should be made

to meet the following requirements like character set, codlng scheme, keyboard

layouts and rendering techniques (Ando Peter, 1994).

Tamil is one of the most popular anclent languages used by large community

in Tamil nadu and all over the world. The Important reason for the "Evergreen",

Page 2: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

nature of the language is "The improv~ng changes in the language over the years". Lot

of revolutionary changes was made in the Tamil scrlpt form by Veeramamunivar

when he promoted the printing technology in Tamil. Also the Tamil nadu

Government has recently introduced certaln changes in the Tamil script form for

improving the printing. These few examples ensure that when the need arises, the

traditional language is capable of adoptlng itself to ~ t s environmental, cultural and

linguistic changes.

Tamil has a large character set consisting of 247 characters and a very lengthy

llst of grammar rules to frame sentence structure to a var~ety of l~terature form. Even

though large character set and heavy grammar rules do not limit the usage, they

increase the usage of Tam11 In variety of ways

Considering this and analyzing the language feature from the computer~zation

point of view, the following advantages are glven below,

- No composite letters

- No uppercaseiZowercase letters

- No cursive letters

These advantages encourage one to venture into the idea of making digital

form of Tamil. Though the large character set and complex grammar rules put some

hurdles In the computerization process, they are not in the major ones. The major

issues are standardization of scriptural form, coding mechan~sm and Tamil keyboard

layout

A.2 Character Set

Many computer experts and Tamil experts felt that the large character set of

Tamil language restricts the efficient usage of Tamil In computers. Varieties of Tamil

character sets in a reduced form have been proposed by them. These scientific

approaches ensure the performance improvement in Tamil based computer and also it

simplifies the language learning process.

Page 3: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

Professor V.C.Kulandaisamy, former Vice Chancellor of Anna University,

presented one such solution for reduc~ng the Tarn11 character set in his popular book

"Ariviyal Tamil" (Kulandaisamy, 1950). HIS approach reduced many character set by

separating the common features of characters, for example to represent the characters

he used only.

Similar approach was presented by Mr.Nellat S.Muthu, Singapore University

in the year 1994 (Nellai S. Muthu, 1994). His character set contains only 34

characters. Special characters like 'Ja', 'Sha' have also been covered. This work has

been justified from the fact that Tam11 language should not be restricted for

communication purposes alone, it needs to adapt itself to the technological

~nnovations

One can see many such works In thts direction. Even though the design issues

have been met~culously carried out, tt has been observed that the user, say common

man, is unable to adapt him to the dramatlc changes in the language structure viz., the

reduction of character set, the character formation etc A similar change IS made in the

Malayalam language for the prtnting purpose (Sujatha, 1994). But the people could

not accept it and they followed the traditional form of Malayalam language and this

shows that dramatic changes cannot be Introduced in any language at once, but only

in a phased manner.

The Chinese and Japanese languages have the character sets which contain

thousands of character set. But In the Chinese-English microcomputer project (Archar

et al . 1988). the complete character set has been used and ~t was a successful project.

Also the introduction of high speed processors overcomes the performance

reduction due to the large Tamil character set.

A.3 Coding Scheme

Consider the coding scheme to be used for ~nternal representation of

characters; the character based coding scheme is better as compared to the glyph

based coding scheme. This 1s because when the character based coding scheme is

used, the system treats the combinations of language glyphs as characters and

Page 4: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

collectively assigns a code to the combination of glyphs that form a character. Also, it

has knowledge regarding the collating sequence of characters when compared to the

glyph based coding scheme wh~ch does not have any knowledge regarding the

characters of the language and handles the language only in terms of glyphs

The character based codlng scheme IS better in another ways. It helps for easy

editing of text as any addition! deletion1 modification would take place in terms of the

individual glyphs of the character. The coding scheme used for representing the

character internally should not occupy more than two b>.tes.

Coding schemes are used for the purpose of standardlzat~on and easy

reference. Tamll language also has many codlng schemes l ~ k e the universal language

English. The following are some of the character~stics of coding schemes

For Unlque ~dent~ficat~on and better collation

Coding schemes enables us to understand the character representation

For standardization purposes

The Tamil coding schemes are basically classified into Glyph based

(Anbarasan, 2000) (TSCII, 2000) (TAM, 2000) (TAB, 2000) (Indic, 2000) and

character based coding schemes (Kuppuswami et al , 1999) (Navaneethan et al., 2003)

(Mudawwar, 1997). Representation of a character is a group of glyphs in some codrng

schemes and rt is a slngle character in some other codlng schemes Figure A.1 shows

that the Glyph based Tamil codlng schemes can be further classified Into one byte and

two byte schemes Character based coding schemes can be further classified into

single code page and mult~ code page schemes. Figure A.l shows the classification of

Tamil codrng schemes

Page 5: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

t Glyph based

I 4

Character based I

1 L

One Byte 5. Unicode Two Byte Single code page Mult~ code page

I. ISCII 6 PONN 7. Panditham 2. TSCll 8. Multicode 3. Monol~ngual 4 Bilingual

Figure A.l: Classification of coding schemes

A.4 Keyboard Layout

For providing mult~l~ngual worklng environment, the users should be

permitted to Input in the language of the~r cho~ce and obtain output from the system in

the corresponding language For Input, the method of mapplng the varlous language

characters to the keyboard has to be decided. For output, the display of the characters

and their printing has to be dec~ded. For Input, there are two types of keyboard

mappings available (Gopinath and Ayyadural, 1994) and these are as follows

Non-phonettc based mapplng.

Phonet~c based mapplng.

Any of the above mentioned keyboard mappings could be used For output,

the required fonts need to be available for d~splay and printing. By adapting the

appropriate solutions, whlch are common to all languages, standards for the

multilingual technology could be achieved. By applylng t h ~ s technology, the

incompatibility and language extensibility problems could be solved. Much Tamil

software follow Tamil typewriter like keyboard i e. QWERTY keyboards. The

limitation of this keyboard layout is many slngle characters are split into two or more

keystrokes. An alternate solution for Tamil of this keyboard is suggested by Mr.

N.Govindaswami of Nanyung University Singapore (N.Govindasami, 1994). He has

Page 6: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

changed the key strokes and its position based on the phonetics. He analyzed the

frequently used keys and he placed those keystrokes in the middle of the keyboard.

He made research on this keyboard layout and he assures that the learnlng process in

the keyboard layout is very simple. Also it improves the typlng speed. In the

meantime, DOE has designed a different keyboard layout to suit all lndlan languages

and its totally new type layout based on the phonemes of the languages. But the Tamil

software manufacturers have not adopted this keyboard layout (Tamil Computer,

1995). Again the standardizat~ons are a cruclal problem in this keyboard layout. Every

s o h a r e follows its own keyboard layout which restricts the user to work only with

the software. To overcome this, a standardized keyboard layout should be selected

from the existlng ones or a new keyboard layout can be defined.

Presently, lot of Tamil software has been made available ma~nly for word

processing applications. Each one adapts its own keyboard layout. This necessitates

the user to learn different keyboard layouts and thus complicates the keying-in

process. This is due to the non-availab~lity of standard Tamil keyboard layout. Tamil

keyboard layouts proposed by the various software developers and researchers have

been collected and categorized into four groups They are presented below.

a) Phonetic keyboard layouts

Keyboard layouts deslgned based on the phonemes and frequency of usage of

Tamil characters is classified as Phonetlc keyboard layouts The following layouts fall

under this category.

- Nalinam

- DOE

- Krishnarnoorthy's

b) Typewriter-like keyboard layouts

Layouts which follow the Tamil typewriter machrne keyboard are classified as

Typewriter-like keyboard layouts. Some of them are

Page 7: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

- Inscript

- Annai

c) Romaoized keyboard layouts

In Roman~zed keyboard layouts, mapping of the Tamil characters to the

corresponding Engllsh characters is done on transliteratton basls. The following

layouts are the examples of this category.

- PONN

- Murasu

- Yarzhan

- Chellapan's

d) Others

The keyboard layouts wh~ch do not fall under the above categories are

grouped as others. Some of them are

- Ventura

- Deskset

Based on the results obtained from the analysis, the Tamil Computer

Standardization Committee recommended the keyboard layouts for Tamil computer.

This recommendation is accepted by the Government and the standard for keyboard

was published (Tamil Keyboard, 1999). Phonetic keyboard layouts give better

performance in all the three phases of analys~s Hence it has been declded to select the

best phonetic keyboard layout and that ma> be refined further. As large numbers of

typewriter trained personnel are readily available, it is also felt that the best

Typewriter-like keyboard layout can also be selected and ~mproved. For the Tamil

people living all over the world, whoever using English keyboard to input the Tamil

Page 8: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

text, Romanized keyboard layout IS essential Therefore. one keyboard layout can be

selected from this category and refined. Hence, umbrella standard of keyboard layouts

have to be made available from each category of keyboards

A.5 Rendering Techniques

From the view of Mr. Andopeter, Softv~ew Computers, the following are the

issues present in the Tam11 scrlpt form (Ando Peter, 1994)

. Maintaining the equal inter character spacing

Al~gning with the baselme

Usage of hook characters

Since the foreign languages contain square based scrlpts, maintaining the inter

character space is slmple. But in Tam11 each character has d~fferent grid slzes. Due to

this, malntalnlng the inter character spaclng In Tamil IS difficult. So, In many Tamil

DTP software, this inter character spaclng is managed by the users. This is tedious

and cumbersome thus reduce the productiv~ty of the system and user

Also the different grid size of the characters Introduces the following

problems;

Handling the back space

Handling the line-end, page-end or window-end.

These issues can be solved by adjusting the g r ~ d sizes either to the maximum

or to the optlmum. The tirst one looses the equal Inter character spacing and in the

second one the larger characters looses its shape. Hence both the solut~ons are not

well received.

These Issues can be solved by the device drlver. I t maintains the software inter

character spacing by keeping the knowledge about the last character to solve the back

space key issue and to keep track of the rema~ning space in a line for the line-end and

page-end or window-end Issues Cons~derlng the letters they can be written either

Page 9: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

below the baseline or above the base-line. As such there is no uniform representation

of these letters in the script form. Both the representations are widely used. Similarly

hook characters etc. starts either from above the letters or to left of it. Again both

representations are widely used.

A.6 Conclusion

Standards of the above aspects are the maln focus. On achieving the standards,

Tamil will be globally used in computer wlth ease Software developers will follow

the standards In mak~ng the Tam11 software and overcome the difficulties due to the

above sald aspects.

References:

(Anbarasan, 2000) N.Anbarasan (2000), lSCl1 And Tam11 - A perspective ,

http.llwww.tam1Ination.org/economiclifeltamilnet99lanarasu.htm

I.

(Ando Peter.1994) Ando Peter (1994), 'Some Difficulties in Creatlng Tamil Fonts

for Computers', International Conference on 'Tamil and

Computers', Anna University. Chennai.

(Archar et al , N P Archar.M W L.Char~. S.J Huang and R.T.Llu (1988), 'A

1988) Chinese - Engl~sh M~crocomputer System', Communications of

ACM.

(Goplnath and K.Goplnath and M.Ayyadura~ (1994). 'Kanltamll - A Computer

Ayyadurai, 1994) Language in Tamil', International Conference on 'Tamil and

Computers', Anna University, Chennai.

(lndic, 2000) lndic (2000), lndic Scripts In Unicode,

http:llcharts.unicode.org/PDFIL)OB9O.pdf.

(Kulandaisamy, Kulandaisamy V S (1950). Ariviyal Tamil, AIU Publications , 1950) New Delhi

Page 10: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

(Kuppuswami et

al., 1999)

(Mudawwar,

1997)

(Navaneethan et

al., 2003)

(Nella~ S. Muthu,

1994)

(TAB, 2000)

(TAM, 2000)

(Tamil Computer,

1995)

S.Kuppuswam~, V.Prasanna Venkatesan and T.Chithralekha

(19991, 'SPECS: Fr~endly Computer System for the Visually

Handicapped-a Proposal', Proceedings of the National

Conference on Creating Convenient and Friendly Environment

for Education and Trainlng of the Handicapped in Technical

Inst~tut~ons,Roorkee.

Muhammad F. Mudawwar (1997), Multicode. A Truly

Multll~ngual Approach to Text Encod~ng, IEEE Computer, (Vol.

30, No 4) pp 37-43

N.Gov~ndasam~ (1994), 'Kan~an Keyboard', International

Conference on 'Tam11 and Computers', Anna University,

Chennal

P. Navaneethan, J Kamesh, Giragadurai and C. Satheesh Kumar

(2003). THIRAVIYAM. A Sales Management System in Tamil

uslng Pandltham, Tam11 Internet. pp. 80-85.

Nellai S Muthu (1994). 'Muthu Kanani', International

Conference on 'Tam11 and Computers', Anna University,

Chennai.

Sujatha (1994), 'Tam11 and Computers', lntematlonal Conference

on 'Tam11 and Computers'. Anna Un~versity, Chennai.

TAB (2000),TABXXX - Blllngual Codlng Scheme for Tamil,

http:llwww.tamilnet99 org/font.htm.

TAM (2000),TAMXXX - Monolingual Coding Scheme for

Tamil , http:llww.tamilnet99.0rg/font.htm.

Tamil Computer (1995), Transliterate Type of Keyboard

Layouts, Tamil Computer Magazine

Page 11: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

(Tamil Keyboard, Tamil Keyboard (1999), Tamil Keyboard Standardization,

1999) G.O.Ms.No 17, Information Technology Department, Tamil

Nadu.

(TSC11,2000) TSClI (2000). TSCIl - The Tamil Encoding Standard,

http:1/www.tam1l.net/tsc1t/tscii.html

Page 12: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

APPENDIX - B

PONN CODING SCHEME AND PONN KEYBOARD LAYOUT

PONN CODING SCHEME

VOWELS

Tamil Characters

el

a

@

R

e

pBI

61

B

EX

0

B

B="

TASCII Code

256

257

258

259

260

261

262

263

264

265

266

267

Page 13: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

CONSONANTS

Page 14: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

VOWELS - CONSONANTS

Tamil characters

Cb - Q m

rm - Qrmm

5 - Qsm

c5 - Qc5m

L - Q L ~

.%a - Q m m

B - Q B ~

c - Q0m

u - Qum

w - Q m

lu - QuJm

v - Q F ~

a, - Qmm

611 - Q6Um

w - Q w

--

TASCII Code

286 - 297

298 - 309

310 - 321

322 - 333

334 - 345

346 - 357

358 - 369

370 - 381

382 - 393

394 - 405

406 - 417

418 - 429

430 - 441

442 - 453

454 - 465

Page 15: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 16: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

TAMIL CHARACTERS WlTH EQUIVALENT ENGLISH KEYS

Page 17: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

APPENDIX - C

SAMPLE SCREENSHOTS

PONN Desktop io English

Page 18: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 19: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 20: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 21: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

KURAL IDE

Page 22: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 23: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 24: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 25: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 26: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the

Multilingual VCD Lending Application -Selection of Movies

Page 27: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 28: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 29: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 30: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 31: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the
Page 32: APPENDIX A EFFORTS IN PREPARING TAMIL LANGUAGE …shodhganga.inflibnet.ac.in/bitstream/10603/1022/18/18_appendics.pdf · Tamil language restricts the efficient usage ... changed the