Towards an Encoding for Surat Ulu

11
Towards an Encoding for Surat Ulu M. Mahali Syarifuddin [email protected] February 2, 2021 1. Introducton Surat Ulu (lit. leters from upstream), sometmes called as Kaganga (or Kegenge or also Kogongo, depends on the dialect) or also called as its regional variant, is a Brahmi-based script ofen found in various manuscripts in southwestern part of Sumatra, now a part of Indonesia. It is ofen used to write in Malay language or its dialects, partcularly eeang, embaa, Serawai, and Pasemah 1 , and sometmes also used to “write” in rrabic 2 . The Unicode Consortum has included eeang bloca— which is actually a regional variant of Surat Ulu—to The Unicode Standard since version 5.1. Yet other regional variants of the script stll lef unencoded. This document present other regional variants of Surat Ulu, comparison to the reference used by existng eeang bloca, as well as issues raised on how this script will be encoded to the Standard. 2. Character Repertoire Below are the character repertoire of Surat Ulu. 2.1 Consonants iae many other Brahmi-based scripts, Surat Ulu is an abugida, meaning that the consonants of the script have placeholder vowel that can be changed by diacritcs. In case of Surat Ulu, each regional variants have their own placeholder vowel infuenced by the language used: both eeang and embaa have /a/, Serawai has /o/, and Pasemah has /ə/ 3 . 1 Sarwono and ahayu, Pusat Penulisan, 5 2 Sarwono and ahayu, Pusat Penulisan, 77—80 3 Sarwono and ahayu, Pusat Penulisan, 104 & 118 1

Transcript of Towards an Encoding for Surat Ulu

Towards an Encoding for Surat UluM. Mahali Syarifuddin

[email protected]

February 2, 2021

1. IntroductonSurat Ulu (lit. leters from upstream), sometmes called as Kaganga (or Kegenge or also Kogongo, depends on the dialect) or also called as its regional variant, is a Brahmi-based script ofen found invarious manuscripts in southwestern part of Sumatra, now a part of Indonesia. It is ofen used to write in Malay language or its dialects, partcularly eeang, embaa, Serawai, and Pasemah1, and sometmes also used to “write” in rrabic2. The Unicode Consortum has included eeang bloca—which is actually a regional variant of Surat Ulu—to The Unicode Standard since version 5.1. Yet other regional variants of the script stll lef unencoded.

This document present other regional variants of Surat Ulu, comparison to the reference used by existng eeang bloca, as well as issues raised on how this script will be encoded to the Standard.

2. Character RepertoireBelow are the character repertoire of Surat Ulu.

2.1 Consonants iae many other Brahmi-based scripts, Surat Ulu is an abugida, meaning that the consonants of thescript have placeholder vowel that can be changed by diacritcs. In case of Surat Ulu, each regional variants have their own placeholder vowel infuenced by the language used: both eeang and embaa have /a/, Serawai has /o/, and Pasemah has /ə/3.

1 Sarwono and ahayu, Pusat Penulisan, 52 Sarwono and ahayu, Pusat Penulisan, 77—803 Sarwono and ahayu, Pusat Penulisan, 104 & 118

1

rick
Text Box
L2/21-116

Figure 1. Jaspan’s eeang character repertoire4.

4 Jaspan, Folk Literature, 13. This partcular source was used in Everson’s eeang proposal. The interestng part is Everson made the “stomach” shape of Cr (c) angular instead of what is supposed to be circular, and the shape made it to the Standard. See Everson, “Proposal”, 5.The romanizaton used old ortography, thus Jaspan’s “tea” is “Cr”, “dea” is “Jr”, “nea” is “NYr”, “ea” is “Yr”, and “ndea” is “NYJr”. Dot below “b” and “d” can be ignored.

2

Figure 2. Jaspan’s consonant concordance of Surat Ulu regional variants5. rlso showed Kerincicharacter repertoire in the third column.

From the figure above, we can see that there are eust very litle diierences in Surat Ulu regional variants, and looas liae the existng codepoints can already cover all the regional variants—albeit the consonants of above figure are incomplete. But 50 years afer Jaspan’s publicaton was published, Sarwono and ahayu showed that there are diierences on these regional variants.

5 Jaspan, Folk Literature, 11. Note that this concordance doesn’t feature all the consonants available in Surat Ulu.

3

Consonants Rejang Lembak Serawai Pasemah

Kr

? ? ?kq

Gr g g g g

NGr

\ \ \] ] ]

N

Tr

T T Tt

/ /b

|

Dr

f>

d d d d^

�ó

Nr n n n nPr p p p p

Brf f

b b

Mr

m m mM M MXØO

Öx x

Cr c c c c

Jr

Z Z Zj

[

4

Consonants Rejang Lembak Serawai Pasemah

NYr Y Y Y Y

Sr

S S8

s s$

r

R R Rrß

r l l l l

Wr

ù ù ùú ú úû û

wW

Yry y y y

ÏHr h h h hMBr B B B B

NDr

D D Dö

4&

NYJrJ J J

J

NGGr§ § §

G

r

ä ä äaA

MPr

ÞF

5

Consonants Rejang Lembak Serawai Pasemah

NTr

,;V

¦!

NYCrC C

©

NGKr

KQ

á5Ñ

GHrH H H

LTable 1. Sarwono and ahayu’s consonant repertoire of Surat Ulu regional variants6.

From the table above, we can see that some regional variants have their own distnct character

shape, for example Serawai variant of the ETTE NGr (N) is unique to such region. When Sarwono and ahayu asaed the informants from Serawai ethnic group, the informants can only

recognize the N character as ETTE NGr and not the other7.

rlso there are consonants already encoded but Sarwono and ahayu found that they are belong toother regional variants, for example the encoded ETTE Br is belong to other regional variants.

Sarwono and ahayu’s research also confirmed Miller’s report that there are more consonant sounds needed to encode the full “Central Malay” script, yet Sarwono and ahayu did not found characters Miller analyzed as /ŋs/ and /ʁ/ consonats8 9. Miller also mentoned that there is a “unique” shape of ETTE r found in Taneung tanah manuscript, but it appears similar to one of Serawai’s NGKr10.

6 Sarwono and ahayu, Pusat Penulisan, 113. Note that this repertoire features common characters found in manuscripts.

7 Sarwono and ahayu, Pusat Penulisan, 32.8 Sarwono and ahayu, Pusat Penulisan, 5.9 Miller, “Indonesian and Philippine Scripts”, 18.10 Miller, “Indonesian and Philippine Scripts”, 20.

6

2.2 Diacritcs

Figure 3. Jaspan’s eeang diacritcs11.

11 Jaspan, Folk Literature, 8. ETTE Kr used as an example. Encoded names of the diacritcs: 1. EJrNG VOWE SIGN U;2. EJrNG VOWE SIGN I;3. EJrNG VOWE SIGN E;4. EJrNG VOWE SIGN O;5. EJrNG VOWE SIGN rU;6. EJrNG VOWE SIGN EU;7. EJrNG VOWE SIGN Er;8. EJrNG CONSONrNT SIGN H9. EJrNG VOWE SIGN rI;10. EJrNG CONSONrNT SIGN ;11. EJrNG CONSONrNT SIGN NG;12. EJrNG CONSONrNT SIGN N;13. EJrNG VI rMr;14. (no diacritc).

7

Diacritcs Rejang Lembak Serawai Pasemah

vowel sign /i/?I ?I ?I

kivowel sign /u/ ?u ?u ku ?uvowel sign /e/ ?E ?E

vowel sign /o/ or /ə/?o

?evowel sign /a/ or consonant sign /h/ ?: ?: k: ?:

consonant sign /n/ ?” ?” k” ?”consonant sign /ŋ/ ?’ ?’ k’ ?’

consonant sign /r/?v ?v kv ?v?z

vowel sign /aw/ ?( ?( k( ?(

vowel sign /ae/

?) ?) ?)?è kè ?è?1

virama

?0 ?0 ?0?1 k1 ?1

kê ?êk2 ?2k6 ?6

Table 2. Sarwono and ahayu’s diacritcs repertoire of Surat Ulu regional variants12.

There are also unique diacritcs in some regional variants. For example there are many shapes but circular in Serawai’s VI rMr.

rlso there are diacritcs already encoded but Sarwono and ahayu found that they are belong to other regional variants, for example the encoded VOWE SIGN I and VOWE SIGN O is belong to other regional variants.

Jaspan showed that there are usage of VOWE SIGN EU even though Sarwono and ahayu did not found it.

12 Sarwono and ahayu, Pusat Penulisan, 114. Note that this repertoire features common characters found in manuscripts.

8

Jaspan diierentate VOWE SIGN Er and CONSONrNT SIGN H because Jaspan analyzed that the signs represents the diphtong /əa/ and final consonant /ʔ/13, yet Jaspan’s examples showed that the signs both represents final consonant /h/. Sarwono and ahayu only found what Jaspan analyzed as VOWE SIGN Er, and did not found a sign to represent final consonant /ʔ/. Is this somesort of unfortunate disunificatono

Miller reported that there is a sign that represent the medial consonant /r/—the same functon as JrVrNESE CONSONrNT SIGN CrK r and SUNDrNESE CONSONrNT SIGN PrNYrK r14, yet such sign is not found by Sarwono and ahayu.

2.3 CVC behavior

Script/Variant Word Visual order

atn dudua (lit. sit)

eeang and embaa dudu?0 Dr + U + Dr + U + Kr + VI rMr

Serawai and Pasemah dud?u0 Dr + U + Dr + Kr + U + VI rMr

Table 3. Example of CVC behavior of Surat Ulu15.

The way Serawai and Pasemah variant handle the CVC is about the same with Bataa script where the vowel maras “are re-ordered when the ailler is used to close the syllable by ailling the inherent vowel of a final consonant”16, yet the behavior is “normal” in eeang and embaa variant.

13 Interestngly, Everson proposed the later character as final consonant /h/. See Everson, “Proposal”, 6.14 Miller, “Indonesian and Philippine Scripts”, 18.15 Sarwono and ahayu, Pusat Penulisan, 11716 Everson and Kozoa, “Proposal”, 3.

9

3. IssuesIssues arise when it comes to encoding Surat Ulu and these need to be solved.

• The name “ eeang” is no longer suitable to refer the script because eeang is actually a subset of the Surat Ulu. If possible, it would be beter if the script is referred as “Surat Ulu”, or eust “Ulu” is also oaay.

• The ideal model to encode this script is to encode all unique characters, for partcular regional variants as well as new character sounds, eust liae Bataa, but as we already see, theconsonants and diacritcs are very diverse even in partcular regional variants. Further researches needed to call which character will be a representaton to partcular consonants/diacritcs and regional variants.

• Usages of VOWE SIGN EU, Er, and CONSONrNT SIGN H need to be more investgated in actual contexts, whether in old manuscripts as well as modern texts, so we can handle these diacritcs beter.

• rbout CVC behavior, due to how diierent regional variants handle it, it should be oaay to type the reordered CVC visually. This also means that combinaton of consonant + vowel sign + VI rMr should also be valid.

• rlso there are scripts related to Surat Ulu such as Ogan17 18 and ampung19. Further researches needed to call whether we can unify these scripts or they deserve their own bloca.

• Not to menton that the eeang bloca have 11 empty codepoints and there is one column inthe BMP lef unencoded, so in total only 27 empty codepoints the BMP can accommodate to extend this script20. If the characters need to be encoded exceed 27, then a new bloca need to be made in the SMP, but this will lead to new problems21.

4. ConclusionSurat Ulu is a diverse script and the diversity needs to be encoded. There are issues to encode the script and the issues need to be solved with further researches so the script can be actually encoded properly.

17 OKU Express, “Kenalaan”.18 Gaiar et. al., Struktur, 98.19 Pandey, “Proposal”.20 Everson et. al., “ oadmap”.21 rnderson et. al., “ ecommendatons”, 6.

10

Referencesrnderson, Deborah, Ken Whistler, ica McGowan, oozbeh Pournader, rndrew Glass, aurentu

Iancu, and isa Moore. “ ecommendatons to UTC #148 rugust 2016 on Script Proposals,” July 30, 2016. htps://www.unicode.org/ 2/ 2016/16216-script-ad-hoc.pdf.

Everson, Michael, and Uli Kozoa. “Proposal for Encoding the Bataa Script in the UCS,” October 7, 2008. htps://www.unicode.org/ 2/ 2008/08011r-n3320r-bataa.pdf.

Everson, Michael, ica McGowan, Ken Whistler, and V.S. Umamaheswaran. “ oadmap to the BMP,”March 12, 2020. htps://unicode.org/roadmaps/bmp.

Everson, Michael. “Proposal for Encoding the eeang Script in the BMP of the UCS,” rpril 24, 2006. htps://www.unicode.org/ 2/ 2006/06139-n3096-reeang.pdf.

Gaiar, Zainal rbidin, Muslim Tuwi, Hasbi Yusuf, Chairani D, and Maamun usydi. Struktur Bahasa Panesak. Jaaarta: Pusat Pembinaan dan Pengembangan Bahasa Departemen Pendidiaan dan Kebudayaan, 1985. htp://repositori.aemdiabud.go.id/2536/1/Struatur220Bahasa220Panesaa2202281985229.pdf.

Jaspan, M. r. Folk Literature of South Sumatra: Redjang Ka-Ga-Nga Texts. Canberra: The rustralianNatonal University, 1964. htps://archive.org/details/folaliteratureof00maea.

“Kenalaan rasara Ogan ae Pelaear.” OKU Express, rpril 5, 2020. htps://oaes.co.id/2020/04/05/aenalaan-aasara-ogan-ae-pelaear.

Miller, Christopher. “Indonesian and Philippine Scripts and Extensions Not Yet Encoded or Proposed for Encoding in Unicode,” March 3, 2011. htps://www.unicode.org/ 2/ 2011/11091-miller-script-report.pdf.

Pandey, rnshuman. “Preliminary Proposal to Encode the ampung Script in Unicode,” March 31, 2016. htps://www.unicode.org/ 2/ 2016/16073-lampung.pdf.

Sarwono, Sarwit, and Ngudining ahayu. Pusat Penulisan dan Para Penulis Manuskrip Ulu di Bengkulu. Bengaulu: UNIB Press, 2014. htp://repository.unib.ac.id/7492/1/Pusat220penulisan.pdf.

11