Globalisation & Computer systems Week 4 writing systems and their implications for globalisation...
-
Upload
griffin-gaines -
Category
Documents
-
view
215 -
download
0
Transcript of Globalisation & Computer systems Week 4 writing systems and their implications for globalisation...
Globalisation & Computer systems
Week 4 writing systems and their
implications for globalisation character representation
ASCII extended ASCII code pages Practical: code pages in VB
Week 6 Writing systems and their
implication for globalisation Directionality (Arabic, Hebrew) Code space: Chinese Context sensitive characters: Arabic Compositionality (Amharic)
Representation bits and bytes characters code points glyphs fonts standardization
Representation What is a bit?
‘a binary digit’, i.e either 0 or 1 What is a byte?
‘the fixed no. of bits that can be treated as a unit by the computer hardware’
A byte can be used to express a character such as “A”
Representation ASCII:
American standard code for information interchange
A standard character encoding system The bytes were originally 7-bits Given this, how many bit patterns? Each pattern maps onto a decimal code
point, and that maps onto a character
Representation Glyphs
the pictures used to represent a given character; many to one:
The character “A” -> AAAAAAAAA
Representation Glyphs
the pictures used to represent a given pictures used to represent a given character; many to one:
The character “A” -> AAAAAAAAA
Fonts the collection, or ‘picture gallery’ of
glyphs
Representation ASCII:
The problem with 7-bit bytes… What about French la tête What about Greek κεφαλη
Extend ASCII to 8-bit bytes ISO (International organization for
standardization) Now 256 bit-patterns
Representation Extended ASCII:
With 8-bit bytes you get 256 bit-patterns
For consistency, the first 128 code-points remain the same from ISO-7
The next 128 used for a range of languages
For each language, you need an interpretation of these 128 code points
The encoding is handled by a code page
Representation Extended ASCII:
For code point 154: CP_EASTEUROPE (code page 1250): š CP_RUSSIAN (code page 1251): љ What about code point 65 for these two
code pages? Now represent your names with your
own orthographies in mind, using the code pages
Representation Code pages in VBPublic Enum ValidCharsets ANSI_CHARSET = 0 GREEK_CHARSET = 161 THAI_CHARSET = 222End Enum Private Sub Form_Load()Dim X As New StdFont X.Charset = 161 X.Bold = True X.Size = 8 X.Name = "Times New Roman" Set frmTest.Font = X Set frmTest.Label1.Font = X Set frmTest.Text1.Font = X frmTest.Label1.Caption = Chr(181) + Chr(225) + Chr(226) frmTest.Text1.Text = Chr(181) + Chr(225) + Chr(226)
End Sub
Representation and UNICODE
What about Chinese? Thousands of characters – 256 bit-
patterns clearly not enough
Representation and UNICODE
What about Chinese? Thousands of characters – 256 bit-
patterns clearly not enough Make the bytes bigger… Bytes have 16-bits, which gives
65536 bit-patterns UNICODE
UNICODE – design principles Reference:
The Unicode Standard, Version 3. 2000.
Online: http://www.unicode.org/unicode/uni2book/