Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR...

58
Proposal for a Devanagari Script Root Zone Label Generation Rule-Set (LGR) LGR Version: 3.0 Date: 2019-04-22 Document version: 6.5 Authors: Neo-Brahmi Generation Panel [NBGP] 1 General Information/ Overview/ Abstract This document lays down the Label Generation Rule Set for the Devanagari script. Three main components of the Devanagari Script LGR i.e. Code point repertoire, Variants and Whole Label Evaluation Rules have been described in detail here. All these components have been incorporated in a machine-readable format in the accompanying XML file named "proposal-devanagari-lgr-22apr19-en.xml". In addition, a document named “devanagari-test-labels-22apr19-en.txt” has been provided. It contains a list of valid and invalid labels as per the Whole Label Evaluation laid down in Section 7 of this document. The labels have been tagged as valid and invalid under the specific rules 1 . In addition, the file also lists the set of labels which can produce variants as laid down in Section 6 of this document. 2 Script for which the LGR is proposed ISO 15924 Code: Deva ISO 15924 Key N°: 315 ISO 15924 English Name: Devanagari (Nagari) Latin transliteration of native script name: dévanâgarî Native name of the script: देवनागर( Maximal Starting Repertoire [MSR] version: 4 1 The categorization of invalid labels under specific rules is given as per the general understanding of the LGR Tool by the NBGP. During testing with any LGR tool, whether a particular label gets flagged under the same rule or the different one is totally dependent on the internal implementation of the LGR Tool. In case of discrepancy among the same, the fact that it is an invalid label should only be considered.

Transcript of Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR...

Page 1: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

ProposalforaDevanagariScriptRootZoneLabelGenerationRule-Set(LGR)LGRVersion30

Date 2019-04-22

Documentversion65

AuthorsNeo-BrahmiGenerationPanel[NBGP]

1 GeneralInformationOverviewAbstractThisdocument laysdown theLabelGenerationRuleSet for theDevanagari scriptThree

main components of the Devanagari Script LGR ie Code point repertoire Variants andWholeLabelEvaluationRuleshavebeendescribedindetailhere

All these components have been incorporated in a machine-readable format in the

accompanyingXMLfilenamedproposal-devanagari-lgr-22apr19-enxml

Inadditionadocumentnamedldquodevanagari-test-labels-22apr19-entxtrdquohasbeenprovided

ItcontainsalistofvalidandinvalidlabelsaspertheWholeLabelEvaluationlaiddowninSection 7 of this document The labels have been tagged as valid and invalid under thespecificrules1Inadditionthefilealsoliststhesetoflabelswhichcanproducevariantsas

laiddowninSection6ofthisdocument

2 ScriptforwhichtheLGRisproposedISO15924CodeDeva

ISO15924KeyNdeg315ISO15924EnglishNameDevanagari(Nagari) Latintransliterationofnativescriptnamedeacutevanacircgaricirc

Nativenameofthescriptदवनागर( MaximalStartingRepertoire[MSR]version4

1 The categorization of invalid labels under specific rules is given as per the general understanding of the LGR Tool by the NBGP During testing with any LGR tool whether a particular label gets flagged under the same rule or the different one is totally dependent on the internal implementation of the LGR Tool In case of discrepancy among the same the fact that it is an invalid label should only be considered

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

2

3 BackgroundonScriptandPrincipalLanguagesUsingItThescriptcalledNagariorDevanagari iswritten fromleft torightHistorically itderives

fromtheBrahmialphabetoftheAshokaninscriptionsDevanagariiscurrentlyusedfor11out of 22 scheduled languages of India (BoroBodo Dogri Hindi Kashmiri KonkaniMaithili Marathi Nepali Sanskrit Santali and Sindhi) and around 45 other languages

especially the related Indo-Aryan languages Bagheli Bhili Bhojpuri Himachali dialectsMagahi Newar and Rajasthani and its dialects Marwari Mewati Shekhawati BagriDhundhari Harauti and Wagdi Closely associated with Sanskrit and Prakrit it is an

alternative script for Kashmiri (by Hindu speakers) Sindhi and Santali It is growingpopular in use by speakers of tribal languages of Arunachal Pradesh Bihar ChattisgarhJharkhandMadhyaPradeshandAndamanampNicobarIslandsThescriptisalsousedinFiji

torepresentFijiHindiHindi isalsoa languageofcommunication inMauritiusMalaysiaEngland Canada South Africa Indonesia as well as emigrant communities around theworldThescriptisalsousedinNepalforwritingtheNepalilanguageNepaliistheofficial

languageofNepalaswellasone languageof thestateofSikkim in India It is spokenbyover30millionpeople

Devanagari is used by over 120 languages in India Bangladesh Nepal and in Southeast

Asia

31 TheEvolutionoftheScript

It is well known that Devanagari has evolved from the parent script Brahmi with its

earliesthistoricalformknownasAśokanBrahmitracedtothe4thcenturyBCBrahmiwas

deciphered by Sir James Prinsep in 1837 The study of Brahmi and its development hasshownthatithasgivenrisetomostofthescriptsinIndiaaswellasinothercountriesvizSriLankaMyanmarCambodiaThailandLaosandtheregionofTibettonameafew

The evolution of Brahmi into present-day Devanagari involved intermediate forms

commontootherscriptssuchasGuptaanditstwogeneratesndashSiddaṃandŚāradāinthenorthandGranthaandKadamba in theSouthDevanagaricanbesaid tohavedeveloped

fromtheKutilascriptadescendantoftheGuptascriptinturnadescendentofBrahmiTheword kutila meaning lsquocrookedrsquo was used as a descriptive term to characterize thecurvingshapesofthescriptcomparedtothestraight linesofBrahmiThis inheritanceis

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

3

thereasonwhysomeofthecharactersacrossthescriptsthatwillbeconsideredundertheNeo-BrahmiGPlooksimilartoeachotherdespitebelongingtototallydifferentcodeblocks

oftheUnicodeStandard

AlookatthedevelopmentofDevanagarifromBrahmigivesaninsightintohowtheIndic

scripts have come to be diversified the handiwork of engravers and writers who used

differenttypesofstrokesledtodifferentregionalstylesThedevelopmentofthescriptisoutlined below Figure 1 Pictorial depiction of evolution of Devanagari illustrates thestagesintheevolutionofthescript2

Period Description

300BCE MauryanEarlyBrahmiformintheAsokanedictsSomescholarsbelievethatBrahmiitselfevolvedfromKharoshthiascriptwrittenrighttoleft

200CE KushanSatavahanaDynasties

400CE GuptaDynasty

600CE Yasodharman

800CE OriginsofthepresentdayNagariScriptVardhanadynastyintheNorthandPallavaperiodintheSouth

900CE TheperiodoftheChalukyasandRashtrakutas

1100CE ContinuationoftheChalukyaRule

1300CE YadavasinthenorthandKakatiyasinthesouth

1500CE TheVijayanagarempire

Table 1 Evolution of Devanagari

2httpwwwacharyagenin8080sanskritscript_devphp

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

4

Figure 1 Pictorial depiction of evolution of Devanagari

32 Languagesconsidered

Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin

the world Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow

- designatedasofficial(scheduled)languagesofsomecountries

- usedbycommunitieslivinginurbanareas

- usedbycommunitieslivinginruralyetaccessibleareas

- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeither

byroadsorbycommunicationmechanisms

Information about official (scheduled) languages of countries is easily available

Information about languages used by communities living in urban areas is also easilyobtainable There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareasHoweveritwasquitedifficulttocoverthe

restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareaswhichare generally not connected by road or by communicationmeans Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor

theanalysisoftheDevanagariLGR

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

5

NBGPdecided to employ ldquoExpandedGraded IntergenerationalDisruptionScalerdquo [EGIDS]

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopmentTheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguageNBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusageFollowingarethedescriptions3ofthosescales

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsin

tradeknowledgeexchangeandinternational

policy

2 Provincial Thelanguageisusedineducationworkmass

mediaandgovernmentatthenationallevel

3 Wider

Communication

Thelanguageisusedineducationworkmass

mediaandgovernmentwithinmajor

administrativesubdivisionsofanation

4 Educational Thelanguageisinvigoroususewith

standardizationandliteraturebeingsustained

throughawidespreadsystemofinstitutionally

supportededucation

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR

3httpswwwethnologuecomaboutlanguage-status

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 2: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

2

3 BackgroundonScriptandPrincipalLanguagesUsingItThescriptcalledNagariorDevanagari iswritten fromleft torightHistorically itderives

fromtheBrahmialphabetoftheAshokaninscriptionsDevanagariiscurrentlyusedfor11out of 22 scheduled languages of India (BoroBodo Dogri Hindi Kashmiri KonkaniMaithili Marathi Nepali Sanskrit Santali and Sindhi) and around 45 other languages

especially the related Indo-Aryan languages Bagheli Bhili Bhojpuri Himachali dialectsMagahi Newar and Rajasthani and its dialects Marwari Mewati Shekhawati BagriDhundhari Harauti and Wagdi Closely associated with Sanskrit and Prakrit it is an

alternative script for Kashmiri (by Hindu speakers) Sindhi and Santali It is growingpopular in use by speakers of tribal languages of Arunachal Pradesh Bihar ChattisgarhJharkhandMadhyaPradeshandAndamanampNicobarIslandsThescriptisalsousedinFiji

torepresentFijiHindiHindi isalsoa languageofcommunication inMauritiusMalaysiaEngland Canada South Africa Indonesia as well as emigrant communities around theworldThescriptisalsousedinNepalforwritingtheNepalilanguageNepaliistheofficial

languageofNepalaswellasone languageof thestateofSikkim in India It is spokenbyover30millionpeople

Devanagari is used by over 120 languages in India Bangladesh Nepal and in Southeast

Asia

31 TheEvolutionoftheScript

It is well known that Devanagari has evolved from the parent script Brahmi with its

earliesthistoricalformknownasAśokanBrahmitracedtothe4thcenturyBCBrahmiwas

deciphered by Sir James Prinsep in 1837 The study of Brahmi and its development hasshownthatithasgivenrisetomostofthescriptsinIndiaaswellasinothercountriesvizSriLankaMyanmarCambodiaThailandLaosandtheregionofTibettonameafew

The evolution of Brahmi into present-day Devanagari involved intermediate forms

commontootherscriptssuchasGuptaanditstwogeneratesndashSiddaṃandŚāradāinthenorthandGranthaandKadamba in theSouthDevanagaricanbesaid tohavedeveloped

fromtheKutilascriptadescendantoftheGuptascriptinturnadescendentofBrahmiTheword kutila meaning lsquocrookedrsquo was used as a descriptive term to characterize thecurvingshapesofthescriptcomparedtothestraight linesofBrahmiThis inheritanceis

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

3

thereasonwhysomeofthecharactersacrossthescriptsthatwillbeconsideredundertheNeo-BrahmiGPlooksimilartoeachotherdespitebelongingtototallydifferentcodeblocks

oftheUnicodeStandard

AlookatthedevelopmentofDevanagarifromBrahmigivesaninsightintohowtheIndic

scripts have come to be diversified the handiwork of engravers and writers who used

differenttypesofstrokesledtodifferentregionalstylesThedevelopmentofthescriptisoutlined below Figure 1 Pictorial depiction of evolution of Devanagari illustrates thestagesintheevolutionofthescript2

Period Description

300BCE MauryanEarlyBrahmiformintheAsokanedictsSomescholarsbelievethatBrahmiitselfevolvedfromKharoshthiascriptwrittenrighttoleft

200CE KushanSatavahanaDynasties

400CE GuptaDynasty

600CE Yasodharman

800CE OriginsofthepresentdayNagariScriptVardhanadynastyintheNorthandPallavaperiodintheSouth

900CE TheperiodoftheChalukyasandRashtrakutas

1100CE ContinuationoftheChalukyaRule

1300CE YadavasinthenorthandKakatiyasinthesouth

1500CE TheVijayanagarempire

Table 1 Evolution of Devanagari

2httpwwwacharyagenin8080sanskritscript_devphp

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

4

Figure 1 Pictorial depiction of evolution of Devanagari

32 Languagesconsidered

Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin

the world Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow

- designatedasofficial(scheduled)languagesofsomecountries

- usedbycommunitieslivinginurbanareas

- usedbycommunitieslivinginruralyetaccessibleareas

- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeither

byroadsorbycommunicationmechanisms

Information about official (scheduled) languages of countries is easily available

Information about languages used by communities living in urban areas is also easilyobtainable There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareasHoweveritwasquitedifficulttocoverthe

restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareaswhichare generally not connected by road or by communicationmeans Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor

theanalysisoftheDevanagariLGR

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

5

NBGPdecided to employ ldquoExpandedGraded IntergenerationalDisruptionScalerdquo [EGIDS]

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopmentTheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguageNBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusageFollowingarethedescriptions3ofthosescales

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsin

tradeknowledgeexchangeandinternational

policy

2 Provincial Thelanguageisusedineducationworkmass

mediaandgovernmentatthenationallevel

3 Wider

Communication

Thelanguageisusedineducationworkmass

mediaandgovernmentwithinmajor

administrativesubdivisionsofanation

4 Educational Thelanguageisinvigoroususewith

standardizationandliteraturebeingsustained

throughawidespreadsystemofinstitutionally

supportededucation

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR

3httpswwwethnologuecomaboutlanguage-status

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 3: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

3

thereasonwhysomeofthecharactersacrossthescriptsthatwillbeconsideredundertheNeo-BrahmiGPlooksimilartoeachotherdespitebelongingtototallydifferentcodeblocks

oftheUnicodeStandard

AlookatthedevelopmentofDevanagarifromBrahmigivesaninsightintohowtheIndic

scripts have come to be diversified the handiwork of engravers and writers who used

differenttypesofstrokesledtodifferentregionalstylesThedevelopmentofthescriptisoutlined below Figure 1 Pictorial depiction of evolution of Devanagari illustrates thestagesintheevolutionofthescript2

Period Description

300BCE MauryanEarlyBrahmiformintheAsokanedictsSomescholarsbelievethatBrahmiitselfevolvedfromKharoshthiascriptwrittenrighttoleft

200CE KushanSatavahanaDynasties

400CE GuptaDynasty

600CE Yasodharman

800CE OriginsofthepresentdayNagariScriptVardhanadynastyintheNorthandPallavaperiodintheSouth

900CE TheperiodoftheChalukyasandRashtrakutas

1100CE ContinuationoftheChalukyaRule

1300CE YadavasinthenorthandKakatiyasinthesouth

1500CE TheVijayanagarempire

Table 1 Evolution of Devanagari

2httpwwwacharyagenin8080sanskritscript_devphp

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

4

Figure 1 Pictorial depiction of evolution of Devanagari

32 Languagesconsidered

Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin

the world Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow

- designatedasofficial(scheduled)languagesofsomecountries

- usedbycommunitieslivinginurbanareas

- usedbycommunitieslivinginruralyetaccessibleareas

- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeither

byroadsorbycommunicationmechanisms

Information about official (scheduled) languages of countries is easily available

Information about languages used by communities living in urban areas is also easilyobtainable There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareasHoweveritwasquitedifficulttocoverthe

restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareaswhichare generally not connected by road or by communicationmeans Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor

theanalysisoftheDevanagariLGR

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

5

NBGPdecided to employ ldquoExpandedGraded IntergenerationalDisruptionScalerdquo [EGIDS]

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopmentTheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguageNBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusageFollowingarethedescriptions3ofthosescales

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsin

tradeknowledgeexchangeandinternational

policy

2 Provincial Thelanguageisusedineducationworkmass

mediaandgovernmentatthenationallevel

3 Wider

Communication

Thelanguageisusedineducationworkmass

mediaandgovernmentwithinmajor

administrativesubdivisionsofanation

4 Educational Thelanguageisinvigoroususewith

standardizationandliteraturebeingsustained

throughawidespreadsystemofinstitutionally

supportededucation

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR

3httpswwwethnologuecomaboutlanguage-status

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 4: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

4

Figure 1 Pictorial depiction of evolution of Devanagari

32 Languagesconsidered

Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin

the world Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow

- designatedasofficial(scheduled)languagesofsomecountries

- usedbycommunitieslivinginurbanareas

- usedbycommunitieslivinginruralyetaccessibleareas

- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeither

byroadsorbycommunicationmechanisms

Information about official (scheduled) languages of countries is easily available

Information about languages used by communities living in urban areas is also easilyobtainable There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareasHoweveritwasquitedifficulttocoverthe

restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareaswhichare generally not connected by road or by communicationmeans Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor

theanalysisoftheDevanagariLGR

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

5

NBGPdecided to employ ldquoExpandedGraded IntergenerationalDisruptionScalerdquo [EGIDS]

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopmentTheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguageNBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusageFollowingarethedescriptions3ofthosescales

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsin

tradeknowledgeexchangeandinternational

policy

2 Provincial Thelanguageisusedineducationworkmass

mediaandgovernmentatthenationallevel

3 Wider

Communication

Thelanguageisusedineducationworkmass

mediaandgovernmentwithinmajor

administrativesubdivisionsofanation

4 Educational Thelanguageisinvigoroususewith

standardizationandliteraturebeingsustained

throughawidespreadsystemofinstitutionally

supportededucation

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR

3httpswwwethnologuecomaboutlanguage-status

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 5: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

5

NBGPdecided to employ ldquoExpandedGraded IntergenerationalDisruptionScalerdquo [EGIDS]

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopmentTheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguageNBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusageFollowingarethedescriptions3ofthosescales

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsin

tradeknowledgeexchangeandinternational

policy

2 Provincial Thelanguageisusedineducationworkmass

mediaandgovernmentatthenationallevel

3 Wider

Communication

Thelanguageisusedineducationworkmass

mediaandgovernmentwithinmajor

administrativesubdivisionsofanation

4 Educational Thelanguageisinvigoroususewith

standardizationandliteraturebeingsustained

throughawidespreadsystemofinstitutionally

supportededucation

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR

3httpswwwethnologuecomaboutlanguage-status

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 6: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

6

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Panchpargania

Sadri

Wagdi

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

Limbu

Magahi

Sanskrit

Santali

TamangEastern

Avadhi

Newar

Saraiki4

Table 2 Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5 theBoro language isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken

Apartfromtheabove-mentionedlanguagesBrajDhundariMundariandKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs321 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts

However it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis Sanskrit is still taught in

schools under various State and Central educational boards There is increasing use ofSanskrit on socialmedia aswell The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageasldquoEducationalrdquo

4 Though listed in EGIDS scale 4 Saraiki is not covered by the NBGP As per Ethnologue the Devanagari script is no longer in use by the Saraiki community Ref httpswwwethnologuecomlanguageskr

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 7: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

7

33 ThestructureofwrittenDevanagari

DevanagariisanalphasyllabaryandtheheartofthewritingsystemistheaksharItisthis

unitwhich is instinctivelyrecognizedbyusersof thescriptTounderstandthenotionofakshar abriefoverviewof thewriting system isprovided in this sectionand theakshar

itselfwill be treated in depth in Section 54 Thewriting system ofDevanagari could besummedupascomposedofthefollowing

331 TheConsonants

Devanagari consonants have an implicit schwa5 ə vowel included in them As per

traditional classification they are categorized according to their phonetic properties

(especially in terms of place plus manner of articulation) There are 5 Varga groups(classes)andonenon-VargagroupEachVargawhichcorrespondstoStopscontainsfiveconsonantsclassifiedaspertheirpropertiesThefirstfourconsonantsareclassifiedonthe

basisofvoicingandaspirationandthelastisthecorrespondingnasal

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3 Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4 Non-Varga consonants

5Although representing the implicit vowel as a is more correct orthographically the schwa ə although not part of the orthographic system has been used since the a would be misunderstood and read as अआा

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 8: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

8

332 TheImplicitVowelKillerHalant6

Allconsonantscontainan implicitvowel(schwa)Aspecialsign isneededtodenote that

this implicit vowel is strippedoff This is knownas theHalant (U+094D)TheHalantthus joins two consonants and creates conjuncts which can be generally from 2 to 4consonantcombinationsInrarecasesitcanjoinupto5consonantsHoweverthenotion

ofmaximumnumber of consonants joining to formone akshar is empirical It is just anobservationdrawnfromthewordsthathavebeenobservedtodateGiventheconfluenceoflanguageshappeningintheInternetagethepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledoutHenceintheLGRworkthislimitwillnotbeenforced7

333 Vowels

Separatesymbolsexist forallVowelswhicharepronounced independentlyeitherat the

beginningorafteravowelsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsign(Matra)isattachedtotheconsonantSincetheconsonanthasabuilt-inschwa

thereareequivalentMatrasforallvowelsexceptingtheअ

Thecorrelationisshownasfollows

Vowel

Corresponding

vowelsign

(Matra)अ

U+0905

U+0906

U+093E

U+0907

ि

U+093F

6 Unicode (cf Unicode 30 and above) prefers the term Virama In this report both the terms have been used to denote the character that suppresses the inherent vowel 7This can be the case when a foreign language word which admits a large number of consonants is transliterated into Devanāgarī

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 9: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

9

U+0908

U+0940

U+0909

U+0941

U+090A

U+0942

U+090B

U+0943

U+090F

U+0947

U+0910

U+0948

U+0913

U+094B

U+0914

U+094C

U+0973

U+093A

U+0974

U+093B

ऎऄ

U+090EU+0904

U+0946

U+0912

U+094A ऍॲ

U+090DU+0972

U+0945

U+0960

U+0944

U+0911

U+0949

U+0975

U+094F

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 10: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

10

U+0976

U+0956

U+0977

U+0957

Table 5 Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D)

334 TheAnusvara(-U+0902)

The Anusvara represents a homorganic nasal It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga Before a non-vargaconsonant the Anusvara represents a nasal sound Modern Hindi Marathi and KonkanilanguagesprefertheAnusvaratothecorrespondingHalf-nasal8

सPतvsसतsəntsaint चQपा vs चपा tʃəmpa A flower belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs U+091A U+0902 U+092A U+093E

335 NasalizationCandrabindu(-U+0901)

Candrabindu denotes nasalization of the preceding vowel as inआखatildekheye (U+0906U+0901 U+0916) Present-day Hindi users tend to replace the Candrabindu by theAnusvara

336 Nukta(-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic It is pre-dominantly used in thismanner inBodo Hindi Kashmiri Maithili Santali Sindhi and Tamang It can be adjoined to 8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding ldquoVargardquo through a Halant 9The possible sets of consonantsvowels have been derived from various sources viz Prior research carried out by Centre for Development of Advanced Computings [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (httpscdacinindexaspxid=mlc_gist_about) Omniglot and inputs provided by various experts on-board the NBGP for specific languages Only Omniglot references have been provided as they are available online

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 11: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

11

क(U+0915) ख(U+0916) ग(U+0917)ज(U+091C) and फ(U+092B) to show thatwords having these consonants with a nukta are to be pronounced in the Perso-Arabic

styleeg

Vफ़रोज़firoz(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunderड(U+0921)andढ(U+0922)toindicateflappedsoundseg

बढ़ bədh(U+092CU+0922U+093C)

WebPublicationDEVANĀGARĪALPHABETANDITSROMANIZATION[109]bytheCentral

HindiDirectorateMinistryofHRDGovernmentofIndiaclearlystatessuchauseofNukta

inHindi

In Bodo the Nukta is adjoined to ड(U+0921) [110] In Maithili it is adjoined to ldquoकrdquo (U+0915)ldquoजrdquo (U+091C)ड (U+0921)andढ (U+0922)[111]InSindhiitisadjoinedtoख (U+0916) ग (U+0917) ज (U+091C)फ (U+092B) ड (U+0921) and ढ (U+0922)[104]

InKashmiri it canalsobeadjoined to च (U+091A) छ (U+091B)and ज (U+091C)[108]toindicatethelaterallyreleasedaffricates

]ायcaytea(U+091AU+093CU+093EU+092F)

^लchalwash-Imperative(U+091BU+093CU+0932)

पॊज़poacutezfact(U+092AU+094AU+091CU+093C)

NormallyaNuktaisappendedtoaConsonantHowevertheSantalilanguageusesNuktain

auniquewayTheNuktaisadjoinedtofollowingvowelsandvowelsigns

a आ (U+0906)b ओ (U+0913)c ा (U+093E)d ो (U+094B)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 12: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

12

337 Visarga(ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to h for

exampleदःख duhkhsorrowunhappiness(U+0926U+0941U+0903U+0916)

TheAvagrahaऽ(U+093D)createsanextrastressontheprecedingvowelandisusedinSanskrit texts It is rarely used in other languages usingDevanagari In case of LGR theAvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire

338 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (after

Halant) where default conjunct formation is to be explicitly restricted and the Halantjoining the two consonantsparticipating in the conjunct formationneeds tobe explicitlyshownForexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshagetsrenderedasक षndashwhenformedbyकka + (halant) + ZeroWidthNon-joiner+ ष sha In certain cases for certain communities this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced

TheZeroWidthJoiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthat

constituting consonant shapes may not be directly visible in the conjunct shape Forexampletheconjunct d ksha whichgetsformedbyकka + (halant) + षshadoesnotshowhalfformofkajoiningwithshaHoweverusingZWJtheconstitutingconsonantrsquos

shapesarepreservedinthevisualdepictione षndashformedbyकka + (halant) + ZeroWidthJoiner+षsha

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection52)HoweverwiththenewrecommendationsinplacethisusageofZWJisnownotencouraged

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 13: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

13

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel there are many different scripts belonging to

separateUnicodeblocksEachofthesescriptshasbeenassignedaseparateLGRhowevertheNeo-BrahmiGPensuredthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmiderivedscriptsThisistheDevanagariLGRwhichcatersto

multiplelanguageswrittenusingDevanagarimostlybelongingtoEGIDSscale1to4

41 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit

411 Inclusionprinciples4111 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

communityCharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposes

only or for archival purposes will not be considered for inclusion in the code-pointrepertoire

4112 Unambiguoususe

Every character proposed shouldhaveunambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage

412 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScopeThesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesetsAllfurtherprinciplesare in fact subsumed under these limitations but have been spelt out separately for the

sakeofclarity

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 14: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

14

4121 ExternalLimitsonScope

The code point repertoire for root zone being a very special case up the ladder in the

protocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRoot

Zone code point repertoire is already constrained by various protocol layers beneath itThefollowingthreemainprotocolsstandardsactassuccessivefilters

i The Unicode Standard

OutofallthecharactersthatareneededbythegivenscriptifthecharacterinquestionisnotencodedinUnicodeitcannotbeincorporatedinthecodepointrepertoireSuchcasesarequiteraregiventheelaborateandexhaustivecharacterinclusioneffortsmadebythe

Unicodeconsortium

ii IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible

representationofagivenscriptlanguageithasencodedasfaraspossibleallthepossiblecharactersneededbythescriptHoweverthedomainnamebeingaspecializedcase it isgovernedby an additional protocol knownas IDNA (InternationalizedDomainNames in

Applications)TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames

For Example Devanagari Letter Qa क़ (U+0958) is not allowed to be a part of domainname Itsdecomposedform ieDevanagariLetterKa followedbyDevanagariSignNuktaक(U+0915)+(U+093C)canbeusedinstead

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired

Indomainnamesduetoabsenceofspaceldquordquoortabldquo-rdquotherewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined together where

previous word needs to end in an Explicit Halant and the next word begins with aconsonantInthatcaseaconjunctwillbeformedbetweenlastconsonantofthefirstword

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 15: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

15

andthe firstconsonantof thesecondword Thisvisualdisplaymaynotbedesired Forexampleiftwowordsदश(dešnation) andgवदश(videšforeignland)arejuxtaposedtoeachother the resultantword ie ldquoदिiवदशrdquo10 isnot theappropriatewayof rendering itAppropriaterenderingofthesamewouldbeldquoदश gवदशrdquowhichcanbeachievedbyaddingaZWNJinbetweenthetwowords

AstheZWNJisnotpartoftheMSRitisnotpermissibletomakesuchcombinationsIfand

when the ZWNJ is permitted by theMSR the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary

HowevertheremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

usedSomespecific shapes11maynotbeable tobemadehowever therewillnotbeany

impactonthephoneticlevel

iii Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDswhichinturnareanevenmorespecializedcaseofdomainnamestheRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

ofcharactersForexampletheDevanagariSignAvagrahaऽ(U+093D)evenifallowedbyIDNAprotocolisnotpermittedintherootzonerepertoireasperthe[MSR]

Tosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given scriptlanguage This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word however in that case one can argue that the pronunciation of the two words ie दश and दश is different and hence it changes the fundamental word 11 Case of d and e ष the first is composed with क++ष while the latter is with क++ZWJ+ष The pronunciation of both the conjuncts is same

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 16: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

16

4122 NoPunctuationMarks

TheTLDsbeingidentifierspunctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda(U+0964)anddoubleDanda(U+0965)willnotbeincluded

4123 NoSymbolsandAbbreviations

Abbreviations weights and measures and other such iconic characters like Isshar

(U+09FA)Abbreviationsign(U+0970)etcwillnotbeincluded

4124 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR ॠ (U+0960) and DEVANAGARILETTER VOCALIC LL ॡ (U+0961) as well as their Matra forms (U+0944) and (U+0963) No such characters will be included This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure

4125 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskritegDEVANAGARISTRESSSIGNUDATTA(U+0951)andDEVANAGARISTRESSSIGNANUDATTA(U+0952)willnotbeincludedThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure

42 MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess

TheDevanagariscriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknownTheDevanagariLGRreceivedvarious comments during the public comments processMost of the comments receivedwere of the editorial nature Some comments demanded the attention to the normativesection of the document The NBGP at-large and the Devanagari team in specific wentthroughthecommentsindetailanddecidedforeachoftheindividualcommentsreceivedifitneededachangeintheoverallLGRrecommendation

WherevertheDevanagariteamdecidedthatachangewasnecessarythechangewasmadeRests of the comments were addressed by a detailed explanation about why the saidchangeisnotnecessaryAnelaboratedocumentwithallsuchexplanationswassharedwiththe NBGP at-large On overall agreement of the entire NBGP the Devanagari LGR wasfinalizedTheanalysisofpubliccommentscanbeaccessedonlinegivenat[114]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 17: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

17

5 RepertoireSection51providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbasedSection52detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanel[NBGP]proposestobe includedintheDevanagariLGR51 DevanagarisectionofMaximalStartingRepertoire[MSR]Version4

Figure 2Devanagari Code Page from [MSR]

Colorconvention12

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 18: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

18

52 CodePointRepertoire

Foreachofthecodepointslanguagereferenceshavebeengiveninthelastcolumntitled

Reference For the entire coverage of Devanagari code points references of HindiMarathi Sanskrit Sindhi andKashmiri havebeen givenThoughonly five representativelanguages have been chosen for referencing they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection32

SrNo

UnicodeCodePoint

Glyph CharacterName Category

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1 0901 DEVANAGARISIGNCANDRABINDU Candrabindu

BodoHindiKashmiriKonkaniMaithiliMarathiNepaliSantaliand

Sanskrit

1HindiNepali

[0][101][102][103][105][108][110][111][112]

[113]

2 0902 DEVANAGARISIGNANUSVARA

Anusvara(Bindu)

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

3 0903 ः DEVANAGARISIGNVISARGA Visarga

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

4 0905 अ DEVANAGARILETTERA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

5 0906 आ DEVANAGARILETTERAA Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

6 0907 इ DEVANAGARILETTERI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

7 0908 ई DEVANAGARILETTERII Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 19: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

19

8 0909 उ DEVANAGARILETTERU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

9 090A ऊ DEVANAGARILETTERUU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

10 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

11 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0][101]

12 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0][105][108]

13 090F ए DEVANAGARILETTERE Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

14 0910 ऐ DEVANAGARILETTERAI Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

15 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel HindiKonkaniMarathiKashmiri 1Hindi

[0][100][101][102][108]

[112]

16 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0][105][108]

17 0913 ओ DEVANAGARILETTERO Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

18 0914 औ DEVANAGARILETTERAU Vowel

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

19 0915 क DEVANAGARILETTERKA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 20: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

20

20 0916 ख DEVANAGARILETTERKHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

21 0917 ग DEVANAGARILETTERGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

22 0918 घ DEVANAGARILETTERGHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

23 0919 ङ DEVANAGARILETTERNGA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

24 091A च DEVANAGARILETTERCA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

25 091B छ DEVANAGARILETTERCHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

26 091C ज DEVANAGARILETTERJA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

27 091D झ DEVANAGARILETTERJHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

28 091E ञ DEVANAGARILETTERNYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

29 091F ट DEVANAGARILETTERTTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 21: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

21

30 0920 ठ DEVANAGARILETTERTTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

31 0921 ड DEVANAGARILETTERDDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

32 0922 ढ DEVANAGARILETTERDDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

33 0923 ण DEVANAGARILETTERNNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

34 0924 त DEVANAGARILETTERTA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

35 0925 थ DEVANAGARILETTERTHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

36 0926 द DEVANAGARILETTERDA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

37 0927 ध DEVANAGARILETTERDHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

38 0928 न DEVANAGARILETTERNA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

39 092A प DEVANAGARILETTERPA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 22: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

22

40 092B फ DEVANAGARILETTERPHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

41 092C ब DEVANAGARILETTERBA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

42 092D भ DEVANAGARILETTERBHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

43 092E म DEVANAGARILETTERMA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

44 092F य DEVANAGARILETTERYA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

45 0930 र DEVANAGARILETTERRA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

46 0932 ल DEVANAGARILETTERLA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

47 0933 ळ DEVANAGARILETTERLLA Consonant

BodoKonkaniMarathiNepali

Sanskrit1Nepali

[0][102][103][110][112]

[113]

48 0935 व DEVANAGARILETTERVA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

49 0936 श DEVANAGARILETTERSHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 23: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

23

50 0937 ष DEVANAGARILETTERSSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104]

[113]

51 0938 स DEVANAGARILETTERSA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

52 0939 ह DEVANAGARILETTERHA Consonant

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][104][105][108]

[113]

53 093A DEVANAGARI VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11][105][108]

54 093B ऻ DEVANAGARI VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11][105][108]

55 093C DEVANAGARISIGNNUKTA Nukta

BodoHindiKashmiriMaithiliSantaliSindhi

1Hindi[0][101][105][108][110][109][111]

56 093E ा DEVANAGARIVOWELSIGNAA Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

57 093F ि DEVANAGARIVOWELSIGNI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

58 0940 ी DEVANAGARIVOWELSIGNII Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

59 0941 DEVANAGARIVOWELSIGNU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

60 0942 DEVANAGARIVOWELSIGNUU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

61 0943

DEVANAGARIVOWELSIGNVOCALICR

Matra HindiMarathiSanskrit 1Hindi [0][101][102]

[103]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 24: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

24

62 0945

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindiKonkaniMarathiSanskrit

Kashmiri1Hindi [0][100][101]

[108]

63 0946

DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0][105][108]

64 0947 DEVANAGARIVOWELSIGNE Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

65 0948 DEVANAGARIVOWELSIGNAI Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][113]

66 0949 ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra HindiKonkaniMarathiKashmiri 1Hindi [0][100][108]

67 094A ॊ DEVANAGARIVOWELSIGNSHORTO

Matra Kashmiri 4Kashmiri [0][105][108]

68 094B ो DEVANAGARIVOWELSIGNO Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

69 094C ौ DEVANAGARIVOWELSIGNAU Matra

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

70 094D DEVANAGARISIGN

VIRAMAHalantVirama

Mostofthelanguagesgivenin

section32

1HindiNepali

[0][101][102][103][105][108][113]

71 094F ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0][105][108]

72 0956 DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11][105][108]

73 0957 DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11][105][108]

74 0972 ॲ

DEVANAGARILETTERCANDRA

AVowel KonkaniMarathi

Kashmiri2KonkaniMarathi

[9][100][102][108][112]

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 25: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

25

75 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11][105][108]

76 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11][105][108]

77 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11][105][108]

78 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11][105][108]

79 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11][105][108]

80 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8][104]

81 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8][104]

82 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8][104]

83 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8][104]

Table 6 Code point repertoire

Apart from the above individual code-points the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionoftheDEVANAGARI

LETTERRRAintherepertoireforenablinginclusionofldquoEyelashRephrdquo13construct

SrNo UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1

0931

094D

092F

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

KonkaniMarathiNepali

[106][107]

13 Unicode uses the term ldquoEyelash Rardquo instead Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930) the term ldquoRephrdquo is used here

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 26: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

26

2

0931

094D

0939

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

KonkaniMarathiNepali

[106][107]

Table 7 Sequences

53 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire

SrNo

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1 U+0904 ऄ DEVANAGARILETTERSHORTA

UsageunknownNotrequiredexplicitlybyanylanguage

2 U+090C ऌ DEVANAGARILETTERVOCALICL

NotinmodernusageExcludedasperconservatismprinciple

3 U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguageRequiredonlyfor

transcribingDravidianalveolarn

4 U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguageRequiredonlyfortranscribingDravidianl

5 U+0944 DEVANAGARIVOWELSIGNVOCALICRR

NotinmodernusageExcludedasperconservatismprinciple

6 U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguageRequiredonlyintransliterationofAvestan

7 U+097A ॺ DEVANAGARILETTERHEAVYYA

UsageunknownNotrequiredexplicitlybyanylanguage

54 StructuralFormationofDevanagari

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwordsknownasaksharInthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

ScriptThese rulesneed slight additions fordifferent languageswritten inDevanagari intermsof

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 27: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

27

-Characteradditiondeletion(egNukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(egEyelashRephconstructisrequiredin

MarathiKonkaniandNepalibutnotinHindi)

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother

In Section 7 the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript

55 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindiThefirstsectionlists

the categories of the characters in the form of variables In the rules instead of theirdescriptive names the variable names are used The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherulesThefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants These rules are based on anIndianStandard(IS131941991)popularlyknownasIndianScriptCodeforInformation

Interchange[ISCII]551 Variablesinvolved

Dash rarrHyphen-Digit rarrIndo-Arabicdigits[0-9]

C rarrConsonantM rarrMatra

V rarrVowel

B rarrAnusvara(Bindu)D rarrCandrabindu

X rarrVisarga

H rarrHalantViramaN rarrNukta

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 28: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

28

552 Operatorsused

Symbol Function

| Alternative

[] Optional

VariableRepetition

() SequenceGroup

Table 8 Symbol functions

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoDevanagari

whenusedtowriteHindiaregiven

553 TheVowelSequence

AvowelsequencebeginswithavowelItmaybeoptionallyfollowedbyanAnusvara(B)

Candrabindu (D) or a Visarga (X) The number of B D or X which can follow a V in

Devanagariarerestrictedtoone

Thepossibility of aVisarga following aCandrabinduorAnusvara is ruledout since it is

usedonlyinVedicandinBengaliscript

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Vowel V अ a U+0905

Vowel+Anusvara V[B] अ aṁ U+0905U+0902

अ U+0905U+0902

Vowel+Candrabindu V[D] अ aṃ U+0905U+0901

अ U+0905U+0901

Vowel+Visarga V[X] अः aḥ U+0905U+0903

अ ः U+0905U+0903

Table 9

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 29: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

29

554 ConsonantSequence

Aconsonantsequencebeginswithaconsonant Itmaybeoptionally followedbyaNukta

(N)Matra(M)Anusvara(B)Candrabindu(D)Visarga(X)oraHalant(H)Thenumberofinstances of these characters occurring after a consonant is restricted to one There is apossibility of further extensionof theConsonant sequence after theNM andH Eachof

thesehasbeendiscussedinthefollowingsections

1Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent)Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant C क kaU+0915

ltsinglecharactergt

Consonant+Nukta C[N] क़ ḳa क U+0915U+093C

Table 10

2A consonant optionally followedbydependent vowel signMatra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra C[M] Vक ki क ि U+0915U+093F

Consonant+Anusvara C[B] क kaṁ क U+0915U+0902

Consonant+Candrabindu C[D] क kaṃ क U+0915U+0901

Consonant+Visarga C[X] कः kaḥ क ः U+0915U+0903

Consonant+Halant C[H] क k(PureConsonant)

क U+0915U+094D

Table 11

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 30: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

30

2AACMsequencecanbeoptionallyfollowedbyDBorX

(CM)[D|B|X]

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Matra+Anusvara CM[B] क kīṁ क ी

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] का kāṃ क ा

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः kīḥ क ी ः

U+0915U+0940U+0903 Table 12

3Asequenceofconsonants(upto4)joinedbyHalant143(CH)C

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC Pयnkrya

Pय

U+0928U+094DU+0915U+094DU+0930U+094D

U+092F Table 13

However in theWLE rules proposed in Section 7 do not impose any restriction on the

numberofconsonantsthatcanbejoinedbyaHalant

Subsets

3AThecombinationmaybefollowedbyMBDorX

Example

14 In case of Sanskrit it can join upto 5 consonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 31: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

31

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra CHC[M] eक kkīeक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]eक

kkaṁ

eक

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]eक

kkaṃ

eक

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]eकः

kkaḥ

eकः

U+0915U+094DU+0915U+0903

Table 14

3B3(CH)CMmaybefollowedbyaBDorX

Example

SequenceDescription Sequence Example Constitutingcharacters

Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] eक kkīṁ

eक

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] eक kkīṃ

eक

U+0915U+094DU+0915U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] eकः kkīḥ

eकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbasedAslanguagesotherthanHindiareconsideredsomeadditionallanguage-specificcharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 32: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

32

and rules are introduced There are some additional finer aspects to these rules as onetakesintoaccountthedigitspunctuationsandspecialstandalonecharacterslikeAvagraha

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebasedexcludesthosecharacters

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 33: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

33

6 VariantsTherearenocharacterscharactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalikeHoweverDevanagarihas ample cases of confusingly similar variants TheNBGP categorizes these confusinglysimilarvariantsintwogroups

Group1Confusingduetopurevisualsimilarity

Group2 Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANNnocasesbelongingtoGroup1areproposedasthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesTable21VisuallyconfusablesinAppendixAVisuallyconfusablecharacterssequencesliststhem

CaseswhichbelongtoGroup2howeverareproposedtobeconsideredasvariantsThese

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations These can cause confusion even to a

carefulobserverandhencebeingproposedasvariantsFollowingisthebriefdescriptionofthesevariantsfollowedbyvariantsinTable16andTable17

61 VowelVowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character (U+093C)positioningwhich is not common in otherDevanagari based languages Santali requirestheNuktacharactertofollowcertainVowelsandMatrasCompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluationrules (given in theSection

61)tobeopenedupforthesespecificcasesAregularnon-Santaliusermostlycannotevenanticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-baseBeingauniquecaseofhomographicsimilaritythefollowingvariantsarebeingproposed

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 34: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

34

Variant1 Variant2

आU+0906

आU+0906U+093C

ओU+0913

ओU+0913U+093C

ाU+093E

U+093EU+093C

ोU+094B

U+094BU+093C

Table 16 Proposed Variants - Set 1

611 VariantcontextruleforSantaliNuktavariants

All of the Nukta variants given in Table 16 Proposed Variants - Set 1 have a typical

characteristicwhichiswithinavariantpairVariant1isasubsetoftheVariant2egin

thefirstpairआ(U+0906)isasubsetofआ(U+0906U+093C)Thisimpliesaregenerativetendency in theory ie if anआ (U+0906) is substituted with आ (U+0906 U+093C) itintroducesanewinstanceofआ(U+0906)asseenhereinboldआ(U+0906U+093C)By

definitionthisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ(U+0906U+093C) therebycreatingan invalidakshar combination आ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNuktaTopreventthisavariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow

RuleAspertheTable 16 Proposed Variants - Set 1theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

characterThusfollowingvariantrelationsareboundbytheabovecondition

आ(U+0906) rarr आ(U+0906U+093C)

ओ(U+0913) rarr ओ(U+0913U+093C)

ा(U+093E) rarr (U+093EU+093C)

ो(U+094B) rarr (U+094BU+093C)

The variant relationship fromVariant 2 toVariant 1 should be equally constrained for

two reasons First a variant is uniquely defined by both the variant mapping and the

context condition imposed on it (See [RFC 7940]) In order to maintain a symmetric

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 35: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

35

definitionofvariantsitisnecessarytodefinebothforwardandsymmetricvariantsusing

thesamecondition(Seealso[RFC8828])Secondthistypeofvariantpairisanldquoeffective

nullvariantrdquowheretheU+093CinonesequencemapstoldquonothingrdquointheotherInorderto

maintaina fully transitivesystemofvariantdefinitions it isnecessary topreventa label

likeU+0906U+093CU+093CfromhavingavariantU+0906U+093CThesamecondition

would ensure this second constraint However as sequences of U+093C followed by

U+093C are already invalid due to context rules on the Nukta the condition is only

requiredforthefirstreasoninthiscase

612 OverlappedvariantanalysisinvolvingNukta

ConsideringthefollowingvariantsetsABCDEachofthemcontains0906or093E

Set Mapping Variant Set A 093E 0901 lt--gt 0949 0902 Variant Set B 0906 0901 lt--gt 0911 0902 Variant Set C 0906 0902 lt--gt 0974 Variant Set D 093B lt--gt 093E 0902

Overlappingvariantsetsinvolving0906and093EplusNukta

Source Glyph Target Glyph Type Variant Context 0906 आ 0906 093C आ harr blocked not followed-by-N

093E ा 093E 093C ा harr blocked not followed-by-N

Whensubstitutingthevariantfromtheoverlappingsetsfor0906or093ErespectivelyintothelefthandsidesequencesofVariantSetsABCandEtheresultisavalidsequencethatdisplayswithaNuktaAsthepresenceabsenceofNuktaisthebasisforseveralvariantsetsthereforethesevariantareextendedtoensurethesymmetryandtransitivity

093E093C0901isaddedtoSetA

0906093C0901isaddedtoSetB

0906093C0902isaddedtosetCand

093E093C0902isaddedtoSetD

ForsetCtheimplicitcodepointcontextof0902and0974arenotequal0902canbefollowedbyVowelsandConsonantsonlybut0974canalsobefollowedby090109020903(Bothcanbeat

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 36: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

36

theendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)Thismatchestheintersectionbetweenthesecontexts

LikewiseforsetD0902canbefollowedbyVowelsandConsonantsonlybut093Bcanalsobefollowedby090109020903(Bothcanbeattheendofthelabel)Thereforethevariantmappingshouldreceiveacontextrulewhen(followed-by-V-C-or-end)

Theconclusionofthesevariantsetsandvariantcontextualrulesare

Set Mapping Variant Contextual Rule Variant Set A 093E 0901 lt--gt 093E 093C 0901 lt--gt 0949 0902 - Variant Set B 0906 0901 lt--gt 0906 093C 0901 lt--gt 0911 0902 - Variant Set C 0906 0902 lt--gt 0906 093C 0902 lt--gt 0974 when(followed-by-V-C-

or-end Variant Set D 093B lt--gt 093E 0902 lt--gt 093E 093C 0902 when(followed-by-V-C-

or-end)

AdditionalNotes

1 OverlappedvariantsinvolvingCandrabindu0901lt--gt09450902istechnicallyoverlappedwiththefoursetsaboveHoweverbecausethevariantcontextruleldquowhen(follows-only-C-or-N)rdquo(See641)noneofthesequencesleadtoavariantwhere0901isexpandedto09450902

2 Anotheroverlappedvariantsetwiththesefoursetsis0902(BAnusvara)lt--gt093A(MMatra)HowevertheyareallinvalidasaMatracanonlyfollowCorNwhileallsequencesincluding0902assecondelementhaveVorMasfirstelement

62 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten inDevanagari script requiresaunique setofVowels andVowel

signs which only a Kashmiri speaker can understand Themajority of Devanagari users

whoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowelsVowelsignswhichlooksimilartotheKashmirionesTherearealsocaseswhereaKashmiriVowelVowelsignscanbeconfusedwithcertainaksharformationsHencetheyarebeing

proposedasvariantsVariant1 Variant2

ॳ U+0973

अ U+0905U+0902

U+093A

U+0902

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 37: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

37

ॴ U+0974

आ U+0906U+0902

ऻ U+093B

ा U+093EU+0902

ऎ U+090E

ऐ U+0910

U+0946

U+0947

ॵ U+0975

औ U+0914

ॏ U+094F

ौ U+094C

Table 17 Proposed Variants - Set 2

No variant contexts are required for these mappings but the code point sequencesintroducedasmappingwill need tobe listed in the repertoirewithmatching codepointcontextsothatbothsourceandtargetofthevariantmappinghavematchingcontexts(not

precededbyH)

63 HalantinFinalPosition(Onlyadiscussionnotproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant (U+094D) vis-agrave-vis the samewordwithout the final Halant As thefunctionofHalant is of a vowel killer coming at the endmanyusers tend to ignore thephonetic effect of its presenceabsence The majority of users would pronounce both

words in the same way thereby creating a perception of (false) equivalence Howeverthere also exist some userswho clearly require the final Halant to achieve the peculiarphonetic effectof a truncated implicit vowel sound in theendTheseusersmakea clear

distinctionbetweenthetwowords(withandwithoutthefinalHalant)Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules forDevanagari

In these cases the presence or absence of final Halant is clearly visible and there is no

apparentcasetomakethemvariantpairsEventuallyinthelightofpracticalexperienceafutureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 38: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

38

64 VariantsbasedonCandrabinduandCandraVowelSignsfollowedbyAnusvara

This isacaseofpairsofsimilar lookingvariants involvingaCandrabindu(U+0901) InDevanagari there are twoCandravowel signs viz (U+0945) and ॉ (U+0949)whichwhen succeeded by an Anusvara (U+0902) create a shape which resembles aCandrabindu(U+0901)ThisgivesrisetopairswhichgetrenderedexactlyalikeinmanyfontsThoughsome fontscanrender themdifferently thebehavior isnotconsistentandkeepsroomforambiguityTheactualpairsofthevariantsareasfollows

Variant1 Variant2

U+0901

U+0945U+0902

U+093EU+0901

U+0949U+0902

U+0905U+0901

U+0972U+0902

U+090FU+0901

U+090DU+0902

U+0906U+0901

U+0911U+0902 Table 18 Proposed Variants - Set 3

Asthefirsttwosuggestedpairsarecomposedonlyofthedependentsignstheydonotgetproperly joined in the absence of an independent character like Consonant or a Vowel

beforethesameForthesakeofclarityboththepairsprecededbyaDevanagariLetterKa(क - U+0915)areshownhere

VariantPair1क(U+0901)andक (U+0945U+0902)

VariantPair2का (U+0949U+0902)andकॉ (U+0949U+0902)

IdeallythecaseofU+0945U+0902shouldberenderedas and the case of (U+0949

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 39: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

39

U+0902)shouldberenderedas where theAnusvara is clearlyseparated fromtheCandrashapeHowevernotallfontsfollowthesameconventionandmanyofthemrender

thetwoshapesexactly likeaCandrabinduThisgivesrise toanambiguityandhencethevariantpossibility

641 VariantcontextruleforCandrabinduandCandra-Anusvaravariantpair

Thevariantpair(U+0901)-(U+0945U+0902)necessitatesthatforcreatingavariantlabeleveryCandrabindu(U+0901)inalabelbereplacedbythesequence (U+0945U+0902)ItshouldbenotedthatthebeginningofthesaidsequenceiswithaMatrasignie(U+0945)WhiletheCandrabindu(U+0901)canbeprecededbyeitherofConsonantsVowelsNuktaoraMatrathesameisnottrueforthe(U+0945)whichisaMatraAMatracanonlybeprecededbyaconsonantoraconsonantfollowedbyaNuktaTohandlethisavariantcontextrulehasbeenaddedto (U+0945U+0902)asgivenbelow

RuleAspertheTable 18 Proposed Variants - Set 3thevariantrelationshipbetweenthepair

(U+0901) - (U+0945 U+0902) exists if and only if Candrabindu (U+0901) is

precededbyaConsonantoraConsonantfollowedbyaNukta

There isnoadditionalconstraintontheprecedingcharacterof (U+0945U+0902)asthe Candrabindu can be preceded by any characterwhich can precede the sequence (U+0945 U+0902) However to express the mapping the sequence U+0945 U+0902 isformally part of the repertoire and as such must have a context rule that matches thecontext rule for U+0945 which happens to be the same context rule as for the variant

mapping For the same symmetry reason as discussed in Section611 above the formaldefinition of the variant mapping (U+0945 U+0902) -gt (U+0901) has the same variantcontextconditiontothatforthemapping(U+0901)-gt(U+0945U+0902)

65 VariantDisposition

AsvariantsmentionedinTable16Table17andTable18areconfusinglysimilaralbeitof

apeculiarnatureitisproposedthattheybeconsideredofblockednature

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 40: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

40

There is nopreference among these variantsWhichever label containing either of these

variantsischosenearliertheotherequivalentvariantlabelshouldbeblocked

66 Cross-scriptVariants

Across-scriptvariantalsosometimesreferredtoasWholeLabelvariant isthevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanother

entirelabelinadifferentscript

Every individualLGRunderNBGP is supposed toprovidea setof crossscriptvariants it

identifieswithallotherscriptsunderNBGP

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theother scriptsunderNBGPThiswasachievedby sharinga listof

most of the Devanagari akshar combinationswith all the other script teams (Thewordlsquomostrsquo is used here as it is not practical to cover all the possible ldquoConsonant +Halant +Consonant + helliprdquo cases However for Devanagari all cases of ldquoConsonant + Halant +

Consonantrdquocombinationswereincludedintheanalysis)

The Devanagari script has a major set of possible cross-script variants only with the

GurmukhiscriptCaseslistedinTable19areofthevariantsthatareproposedtobecross-

script variants between Devanagari and Gurmukhi Similarly Table 20 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali

ItistobenotedthatnoneofthecombinationslistedinTable19andTable20aretermedto

be equivalentsof eachother semanticallyorotherwiseTheyareonly groupedbasedonpossiblevisualconfusability

NBGPhasensuredthatDevanagariBengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunicationsThesamesetofcross-scriptvariants(withDevanagari)issupposedtobefoundintheBengaliandGurmukhiLGRdocuments

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 41: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

41

Devanagari Gurmukhi

U+0902

U+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

U+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20

ढU+0922

ਫU+0A2B

U+092A

ਧU+0A27

भU+092D

ਮU+0A2E

मU+092E

ਸU+0A38

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 42: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

42

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

U+093A

U+0A02

U+093C

U+0A3C

िU+093F

ਿU+0A3F

ीU+0940

ੀU+0A40

U+0945

U+0A71

U+0946

U+0A47

U+0946

U+0A4B

U+0947

U+0A47

U+0947

U+0A4B

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 43: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

43

U+0948

U+0A48

U+0956

U+0A41

U+0957

U+0A42

ि7ट

U+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टU+092AU+094DU+091FU+0947

ਏU+0A0F

7ट

092A094D091F0946 ਏ

U+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 19 Proposed Cross-script Devanagari-Gurmukhi Variants

Devanagari Bengali

मU+092E

মU+09AE

िU+093F

িU+09BF

Table 20 Proposed Cross-script Devanagari-Bengali Variants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 44: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

44

In addition to above cases the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusableswhichlooksimilarbutnotsimilarenoughtoberecommendedas

cross-scriptvariantsTheTable22DevanagariCross-scriptconfusablesinAppendixBCross-scriptConfusablesliststhem

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

32whenwritteninDevanagariScriptTheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification

BelowarethesymbolsusedintheWLErules foreachoftheCategoryasmentionedin

theTable6Codepointrepertoire

C rarr Consonant

M rarr Matra

V rarr Vowel

B rarr Anusvara(Bindu)

D rarr Candrabindu

X rarr Visarga

H rarr HalantVirama

N rarr Nukta

S rarr EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)His094D( - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules

1 NmustbeprecededonlybyamemberofC1V1orM1

ThesetC1consistsoftheseconsonants

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 45: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

45

a क (U+0915)

b ख (U+0916)

c ग (U+0917)

d च (U+091A)

e छ (U+091B)

f ज (U+091C)

g ड (U+0921)

h ढ (U+0922)

i फ (U+092B)

ThesetV1consistsofthesevowels

a आ (U+0906)(RequiredinSantalilanguage)

b ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras

a ा (U+093E)(RequiredinSantalilanguage)

b ो (U+094B)(RequiredinSantalilanguage)

2 HmustbeprecededbyCorCN15

3 MmustbeprecededbyCorCN16

4 XmustbeprecededbyeitherofVCNorM

5 BmustbeprecededbyeitherofVCNorM

6 DmustbeprecededbyeitherofVCNorM

7 VCanNOTbeprecededbyH(detailsinCaseofVprecededbyH)

15 where CN is a C followed by an N 16 where CN is a C followed by an N

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 46: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

46

Additional rules are used only for variants where a Nuktamaps to a null or that areoverlapped

bull VariantisnotdefinediffollowedbyaNukta(see641)

bull VariantundefinedifitisnotfollowedbyVorC(includingRRA)orendoflabel(See612)

CaseofEyelashReph

IntheWLErulesthereisnospecificmentionoftheEyelashRephfortworeasons1 AstheU+0931isaddedasapartofpermissiblesequencesinTable7Sequencesit

getspermittedonlywiththespecificsequences

2 The last characters of both the sequences of which the U+0931 is part are

consonants As the Eyelash-Reph can take all the combinations as that of a

consonantnospecifichandlingintermsofcontextruleisrequired

CaseofVprecededbyH

Asanyvalidakshar inDevanagari begins eitherwith aConsonantor aVowel in caseofmulti-words domains it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character It is to be noted that only the case ldquoVprecededbyHrdquoneedsaspecialdiscussionasgivenbelow

There couldbe cases involvingmulti-worddomainswhereVmayneed tobe allowed tofollow anH egआमअचार aːməchaːrMango pickle (U+0906 U+092EU+094DU+0905U+091AU+093EU+0930)

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanHandthesecondwordbeginswithaVSomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintendedHoweverbyandlarge

theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

ThisisauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoireOtherwiseV isneverrequiredtobeallowedtofollowanHPermittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity

hencethisisexplicitlyprohibitedbytheNBGP

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 47: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

47

If required in future depending on the prevailing requirements by the community the

NBGPmayconsiderrevisitingthisrule

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 48: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

48

8 ContributorsNBGPCo-chairsDrUdayaNarayanSinghMrMaheshDKulkarniandDrAjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India HindiEnglishCo-Chair MaheshDKulkarni C-DAC India MarathiHindi

EnglishCo-Chair UdayaNarayana

SinghVisva-BharatiSantiniketanWestBengal

India BengaliMaithiliHindiEnglish

Member AbhijitDutta Wikimedia India BengaliHindiMember AkshatSJoshi

(Editor)C-DAC India HindiMarathi

EnglishMember AnivarAAravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India HindiBengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal NepaliNewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal NepaliNewar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversityamp

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentreRavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneursHub

(NEHUB)Nepal NepaliNewar

Member JayPaudyal Consultant India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 49: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

49

Member JijoPappachan DNDomains India MalayalamMember KCTikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember NDeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiEnglishMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiEnglishMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India EnglishHindi

MarathiGujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member SManiam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam

SourashtraTamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMissionMysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramSWarde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania PGDofDogriUniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkzcom India MalayalamMember SurajAdhikari MercantileCommunications(and

npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member UBPavanaja httpvishvakannadacom India KannadaMember UmaMaheshwarG CALTSUnivofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultanthttpsमराभारत India Hindi

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 50: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

50

In addition following members externally gave inputs to NBGP for the respectivelanguagesscripts

Name LanguageScriptExpertiseAjitKumar AwadhiBrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni HindiMarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrKPLekhwani SindhiDrBirendraKumarSoy MundariLanguageDrDineshKumarShrivastav MagahiLanguageDrHarvinderKaur GurmukhiScriptDrLaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanelMaximalStartingRepertoiremdashMSR-4OverviewandRationale7February2019httpswwwicannorgensystemfilesfilesmsr-4-overview-25jan19-enpdf(Accessedon18thFeb2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScalehttpswwwethnologuecomaboutlanguage-status(Accessedon13thNov2017)

[NBGP]Neo-BrahmiGenerationPanel

[RFC7940]DaviesKandAFreytagRepresentingLabelGenerationRulesetsusingXMLRFC7940August2016httpstoolsietforghtmlrfc7940(Accessedon1stDec2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 51: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

51

[RCF8228]AFreytagldquoGuidanceonDesigningLabelGenerationRulesets(LGRs)SupportingVariantLabelsAugust2017httpstoolsietforghtmlrfc8228(Accessedon1stDec2017)

[gTLD]genericTopLevelDomain

[ISCII]IndianScriptCodeforInformationInterchangehttpscdacinindexaspxid=mlc_gist_iscii(Accessedon2ndFeb2018)

[GIST]GraphicsIntelligencebasedScriptTechnologieshttpscdacinindexaspxid=gist(Accessedon2ndFeb2018)

[C-DAC]CentreforDevelopmentofAdvancedComputinghttpscdacin(Accessedon2ndFeb2018)

[0]TheUnicodeStandard11httpwwwunicodeorgversionsUnicode110(Accessedon12thDec2017)

[8]TheUnicodeStandard50httpwwwunicodeorgversionsUnicode500(Accessedon12thDec2017)

[9]TheUnicodeStandard51httpwwwunicodeorgversionsUnicode510(Accessedon12thDec2017)

[11]TheUnicodeStandard60httpwwwunicodeorgversionsUnicode600(Accessedon12thDec2017)

[100]DevanāgarīVIPTeamldquoVariantIssuesReportrdquoICANN3rdOct2011httpsarchiveicannorgentopicsnew-gtldsdevanagari-vip-issues-report-03oct11-enpdf(Accessedon10thOct2017)

[101]OmniglotHindihttpswwwomniglotcomwritinghindihtm(Accessedon10thOct2017)

[102]OmniglotMarathihttpswwwomniglotcomwritingmarathihtm(Accessedon10thOct2017)

[103]OmniglotSanskrithttpswwwomniglotcomwritingsanskrithtm(Accessedon10thOct2017)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 52: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

52

[104]OmniglotSindhihttpswwwomniglotcomwritingsindhihtm(Accessedon10thOct2017)

[105]OmniglotKashmirihttpswwwomniglotcomwritingkashmirihtm(Accessedon10thOct2017)

[106]Unicode1000SouthandCentralAsia-I-OfficialScriptsofIndiardquoPage456(R5andR5a)httpwwwunicodeorgversionsUnicode1000ch12pdf(Accessedon13thNov2017)

[107]UnicodeIndicGroupDevanagariEyelashRahttpunicodeorg~emulleriwgp8utcdochtml(Accessedon13thNov2017)

[108]MKRainaHowtoreadandwriteKashmiriinDevanagarihttpwwwkoshurorgpdfLet20Us20Learn20Kashmiripdf(Accessedon12thDec2017)

[109]CentralHindiDirectorate-MinistryofHRD-GovtofIndiaDevanāgarīAlphabetanditsRomanizationhttphindinideshalayanicinenglishhindi_orgindevnagarithesysmbolshtml(Accessedon12thDec2017

[110]OmniglotBodohttpswwwomniglotcomwritingbodohtm(Accessedon12thDec2017)

[111]OmniglotMaithilihttpswwwomniglotcomwritingmaithilihtm(Accessedon12thDec2017)

[112]OmniglotKonkanihttpswwwomniglotcomwritingkonkanihtm(Accessedon20thMay2018)

[113]OmniglotNepalihttpswwwomniglotcomwritingnepalihtm(Accessedon20thMay2018)

[114] NBGP Public comment feedback for Devanagari Gujarati Gurmukhi Script LGRProposalshttpsdocsgooglecomdocumentd1CLKdJBTNDcC_sFFs5s0a_Bk0zQUER2BIruYuyCNgkAw(Accessedon18thFeb2019)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 53: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

53

10 Booksarticlesandwebographiesconsulted

Followingisathematicallysortedsetofdocumentsbooksarticlesandwebographies

consultedinthedraftingofthisreport

101 WRITINGSYSTEMS1 DillingerDTheAlphabetAKeytotheHistoryofMankind3rdEditionin2

VolumesHutchisonLondon1968

102 DEVANĀGARĪ1 AgrawalaVS(1966)TheDevanāgarīscriptInIndianSystemsofWriting(Pp12-

16)DelhiPublicationsDivision

2 AgyeyaSacchindanandHiranandVatsyayan1972BhavantiDelhiRajpalandSons

3 BeamesJohn1872-79AComparativeGrammaroftheModernAryanLanguagesof

India3volsLondonTrubnerandCo[ReprintedbyMunshiramManoharlalNew

Delhi1966]

4 BhatiaTejK1987AHistoryoftheHindiGrammaticalTraditionHindi-Hindustani

GrammarGrammariansHistoryandProblemsLeidenNewYorkEJBrill

5 BrightW(1996)TheDevanāgarīscriptInPDanielsandWBright(eds)The

WorldrsquosWritingSystems(Pp384-390)NewYorkOxfordUniversityPress

6 CardonaGeorge1987SanskritInTheWorldsMajorLanguagesBernardComrie

(ed)LondonCroomHelm448-469

7 DwivediRamAwadh1966ACriticalSurveyofHindiLiteratureDelhiMotilal

Banarsidass

8 FaruqiShamsurRahman2001EarlyUrduLiteraryCultureandHistoryDelhi

OxfordUniversityPress

9 GuruKamtaPrasad1919HindiVyakaranVaranasiNagariPrachariniSabha

(1962edition)

10 KachruYamuna1965ATransformationalTreatmentofHindiVerbalSyntax

LondonUniversityofLondonPhDdissertation(Mimeographed)

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 54: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

54

11 KachruYamuna1966AnIntroductiontoHindiSyntaxUrbanaUniversityof

IllinoisDepartmentofLinguistics

12 KalyanKaleandAnjaliSoman1986LearningMarathiShriVishakhaPrakashan

Pune

13 McGregorRS(1977)OutlineofHindiGrammar2ndedDelhiOxfordUniversity

Press

14 McGregorRS1972OutlineofHindiGrammarwithExercisesDelhiOxford

UniversityPress

15 McGregorRS1974HindiLiteratureoftheNineteenthandEarlyTwentieth

CenturiesWiesbadenHarrassowitz

16 McGregorRS1984HindiLiteraturefromItsBeginningstotheNineteenth

CenturyWiesbadenHarrassowitz

17 PandeyPK(2007)Phonology-orthographyinterfaceinDevanāgarīforHindi

WrittenLanguageandLiteracy10(2)139-1562007

18 RaiAmrit1984AHouseDividedTheOriginandDevelopmentofHindiHindavi

DelhiOxfordUniversityPress

19 SharadOnkar1969LohiyakeVicarAllahabadLokbharatiPrakashan

20 SinghAK(2007)ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturiesInPGPatelPPandeyandDRajgor(eds)

TheIndicScriptsPaleographicandLinguisticPerspectives(Pp85-107)New

DelhiDKPrintworld

21 SproatR(2000)AComputationalTheoryofWritingSystemsCambridge

UniversityPress

22 TiwariPanditUdaynarayan1961HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage]PrayagLeaderPress

23 VermaMK1971TheStructureoftheNounPhraseinEnglishandHindiDelhi

MotilalBanarsidass

103 INDICCOMPUTINGSPECIFIC1 IS104018-bitcodeforinformationinterchange1982

2 IS103157-bitcodedcharactersetforinformationinterchange1985

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 55: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

55

3 IS123267-bitand8-bitcodedcharactersets-Codeextensiontechniques1987

4 ISO15919Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters2001

5 ISO2375Procedureforregistrationofescapesequences2003

6 ISO88598-bitsingle-bytecodedgraphiccharactersets-Parts1-131998-2001

7 IDNPOLICYhttpmeitygovinwritereaddatafilesIndia-IDN-Policypdf

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 56: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

56

11 AppendixAVisuallyconfusablecharacterssequencesTheTable 21 below shows characters character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscriptHowevertheyarenotconsideredconfusingenoughtobecategorizedasvariants

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

]U+091AU+093C

छU+091B

^U+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 21 Visually confusables

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 57: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

57

12 AppendixBCross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

GurmukhiscriptTheTable22liststhem

InadditiontoGurmukhisomeinstancesofcross-scriptconfusablearefoundwithBengali

GujaratiTeluguKannadaMalayalamandSinhala

None of the combinations listed in Table 22 are considered equivalents of each other

whether semantically or otherwise They are only grouped based on possible visualconfusability

Atfirsttheymaynotlookexactlythesamehoweverinthegivencontexteginabrowser

barasapartofadomainnameorasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishingtheycancreatevisualconfusion

A label canbeconsidered tohaveacross-scriptvariant labelonly if all theconstituent

charactersaksharashaveanequivalentconfusableintheotherscriptIfthereisevenonesingle characterakshara which does not have an equivalent visual confusable in other

scriptitessentiallyprovidesavisualdistinctionandhenceanon-confusablestring

Devanagariconfusable Otherscriptconfusable Fromscript

U+0903

ઃU+0A83 Gujarati

U+0903

U+0C03Telugu

U+0903

U+0C83Kannada

U+0903

ഃU+0D03

Malayalam

U+0903

ඃU+0A28

Sinhala

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only

Page 58: Proposal for a Devanagari Script Root Zone Label ... · Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel 2 3 Background on Script and Principal Languages Using

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

58

U+0903

ঃ17

U+0983 Bengali

U+0909

ওU+0993

Bengali

U+0918

ঘU+0998

Bengali

U+0920

U+0A28Gurmukhi

U+0920

U+0A30Gurmukhi

U+0921

U+0A21Gurmukhi

U+0921

U+0A24Gurmukhi

ढU+0922

U+0A22Gurmukhi

U+0924

U+0A1CGurmukhi

U+092F

U+0A27Gurmukhi

U+0945

U+0981

Bengali

Table 22 Devanagari Cross-script confusables 17 The Bengali and Devanagari Visarga pair was discussed at length for inclusion in normative part of the Devanagari and Bengali LGRs It was decided that both look different enough not to be included in the normative part and hence have been added in the Appendix as confusables only