ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

26
ObjectStudio for Unicode Alexander Augustin Getting ready for global markets

Transcript of ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Page 1: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ObjectStudio for Unicode

Alexander Augustin

Getting ready for global markets

Page 2: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

OverviewOverview

Problem description

History of character sets and Encoding

Goals and approach

Features and technologies

Limitations

Conclusions

Page 3: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ObjectStudio 6.9.1ObjectStudio 6.9.1

ObjectStudio is an integrated Smalltalk environment for the Windows platform

Access to most common Windows services and database systems, like DLL functions, COM, ODBC, Oracle …

It’s Smalltalk – so almost anything is possible – except easy localization and processing multilingual data.

Page 4: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ObjectStudio 6.9.1 in a Unicode WorldObjectStudio 6.9.1 in a Unicode World

ObjectStudio(ANSI/OEM)

Operating System(Unicode)

Other programs(Unicode)

Data sources(Unicode)??

Page 5: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Go Multilingual!Go Multilingual!

Applications in a global market must represent texts and names of Eastern Europe and Asia.

User interfaces must be localizable

Offer capabilities of handling multilingual Data

Must be supported by the runtime environment and the development system

Screenshot: Japanese Version of Microsoft Word

Page 6: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ObjectStudio 6.9.1ObjectStudio 6.9.1

Supports:

ANSI (CP1252) and OEM (CP850)

8 Bit characters

Adequate for:

Writing source code

Creating English UIs

Processing English text files

Accessing databases withEnglish texts Screenshot: ObjectStudio 6.9.1 Environment

Page 7: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

OverviewOverview

Problem description

History of character sets and Encoding

Goals and approach

Features and technologies

Limitations

Conclusions

Page 8: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

The history of character setsThe history of character sets

Punch card – late 18th century

Enhanced by Holerith (patented 1890)

5 channel punch tape – 19th century

25 = 32, not enough for 26 letters + 10 digits

Solution: shift key as prefix state shift

8 channel punch tape – mid 20th century

7 bit US-ASCII + parity

No support for umlauts

VT220 terminal invents ISO8859-L1 - 1975

Similar to Microsoft codepage 1252

Many character encodings for many languages

EBCDIC, KOI8, ShiftJIS, …

Page 9: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

UnicodeUnicode

Unicode - a standard defined by the Unicode consortium.

Unicode assigns a unique number (code point) to each glyph

Version 4.0.0 reserves more than 1.000.000 code points

Several transformation formats for binary representation of Unicode code points

UCS-2 (2Bytes/char), UTF-8 (1-4 bytes/char), UTF-16 (2/4 bytes/char)

Page 10: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

UnicodeUnicode

World-wide unification effort for all characters of the world

Supported by all major vendors!

The solution for ObjectStudio!

Page 11: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

EncodingEncoding

Character CodeBinary

representation

Transforming characters into their binary representation in another encoding

One main problem when accessing external data sources

Distinguish between specialized encodings and Unicode

Page 12: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Byte EncodingsByte Encodings

Differ in the value that represents a character in the encoding

Do not differ in the binary format of the code ( always 1 Byte)

Decimal value/Binary hexadecimal representation

Encoding\character Ö €

CP1252 214/D6 128/80

CP852 153/99 --

ISO8859-L15 214/D6 164/A4

Character Code Binary representation

Page 13: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Unicode EncodingsUnicode Encodings

Do not differ in the value (Code Point) that is assigned to a character

Differ in the binary format of the value

Character Code Point Binary representation

Hexadecimal binary representation

UTF\character Ö (Code Point 214) € (Code Point 8364)

UCS-2 (little-endian) D6 00 AC 20

UTF-8 C3 96 E2 82 AC

Page 14: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

GoalsGoals

1. Enable Unicode!Extend encoding capabilities

Provide native multilingual IO support

2. Extend external access featuresAdd Unicode file access

Add Unicode database access

Page 15: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ChangesChanges

Create a Unicode VMMake ObjectStudio a native Windows Unicode application

Adapted class libraryMake Smalltalk String/Symbol Objects 16bit Unicode strings (UCS-2)

Add encodings

External interfaces and resourcesC Calls

Unicode File access

Database access (ODBC, OCI)

Page 16: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Stream EncodingStream Encoding

Ported from VisualWorks

Use StreamEncoders and CharacterEncoders that „know“ the encoding

Can be applied to any kind of stream with a byte-like buffer to encode or decode data

EncodedStreamEncodedStream

StreamStream

StreamEncoderStreamEncoder

BufferBuffer

CharacterCharacterEncoderEncoder

Page 17: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

CharacterEncoderCharacterEncoder

StreamEncoderStreamEncoder

Stream EncodingStream Encoding

EncodedStreamEncodedStream

StreamStream

BufferBuffer

Character

Code

Binary representation

Page 18: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

StreamEncoding use casesStreamEncoding use cases

Accessing external services and storages without UCS-2 support (e.g. ANSI C calls)

Examples

Access to databases without UCS-2 support

Calling ANSI DLL functions without UCS-2 support

String transfer via TCP/IP

Access to text files with foreign encodings

Page 19: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Text file accessText file access

Read/write access to any kind of text fileUTF8, UTF16, UCS-2 little-endian, … CP1252 (Windows ANSI) CP850 (Windows OEM)And Many more

Using EncodedStreams and NewFileStreams

Example: read UTF-8 encoded file

| fileStream encoder encodedStream result |fileStream := NewFileStream file: ‘example.txt’ mode: #binary onError:

[ self error: ‘could not open file’ ].encoder := StreamEncoder new: #utf8.encodedStream := EncodedStream on: fileStream encodedBy: encoder.result := encodedStream upToEnd.encodedStream close

Page 20: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

External Database AccessExternal Database Access

Supported Unicode database interfaces

ODBC

OCI (ORACLE Call Interface)

Features

Native access to Unicode data sources

No application modifications needed

Requirements

ODBC: Version 3.5

OCI: OCI Client Version 9.0.1 (9i) or higher

Page 21: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

LimitationsLimitations

Source files continue to be OEM encoded

Store Unicode text data in text files or external databases

UIs sources can‘t contain Unicode strings

Use external files/databases to store Unicodedata for localizing UIs

Planned to implement some localization support

Implicit conversions between Strings and ByteArrays cannot be supported

Use encoded streams or #asByteArrayEncoding:

Page 22: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

LimitationsLimitations

Image files are not compatible

Compile class files and create new images

Page 23: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

ConclusionConclusion

ObjectStudio Unicode

Operating System(Unicode)

Other programs(Unicode)

Data sources(Unicode)

Page 24: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

AvailabilityAvailability

ObjectStudio 7.0 for Unicode is available to the new CINCOM Smalltalk CD together with VisualWorks 7.3

Page 25: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Contact Information

Email: [email protected]

We provide project support to internationalize your ObjectStudio application

Georg Heeg eKBaroper Str. 337D-44227 DortmundTel: +49-231-97599-0Fax: +49-231-97599-20

Georg Heeg AGSeestr. 131CH-8027 ZürichTel: +41-848-433424

Georg Heeg eKMühlenstr. 19D-06366 KöthenTel: +49-3496-214 328Fax: +49-3496-214 712

Email: [email protected]://www.heeg.de

Page 26: ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

2004 Cincom Systems, Inc. All Rights Reserved

Developed in the U.S.A.CINCOM, , and The World’s Most Experienced Software Company are trademarks or registered trademarks

of Cincom Systems, Inc

All other trademarks belong to their respective companies.