Unicode System - Outside Communication for ABAP Programmers
-
Upload
rafael-riso -
Category
Documents
-
view
46 -
download
5
description
Transcript of Unicode System - Outside Communication for ABAP Programmers
-
Unicode System: Outside Communication for ABAP Programmers
Dr. Christian HansenServer Technology Internationalization SAP AG
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 2
Contents
Introduction About Code Pages Communication: The Ideal Picture Communication: The Reality
Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system
Part II File transfer Writing and reading files on the application server Writing and reading files on the front end
Part III Common mistakes
Exercises
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 3
About Code Pages: Conventional Code Pages
Disadvantages of old standard code pages Each covers only a subset of all characters used Incompatibilities between different codepages Only restricted data exchange possible Too many of them
CanonKYOCERA
APPLE
IS0-9
IS0-2IS0-3
IS0-5
12IS0-71250
1251
1252
HPIBM
IS0-9
IS0-2IS0-3
EBCDIC
12IS0-7
697/0277697/
05001252
1256
IS0-2IS0-3
1257
1254
12501251
1252
Mircosoft
ASCII
BIG-5
SJISIS0-9
IS0-2IS0-3
IS0-5
IS0-6IS0-7IS0-8
IS0-4
IS0-1
BIG-5
SJISIS0-9
IS0-2IS0-3
IS0-5
IS0-6IS0-7IS0-8
IS0-4
IS0-1 SAP: Languages: 41
Characters: 22,378
Code Pages: 390
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 4
Solution: Unicode, one Code Page for all Scripts
English
German
Turkish
DanishDutch,FinnishFrench, ItalianNorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
And morelanguagescan besupportedeasilywithout the
need fornew codepages orother newmethods
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 5
Solution: Unicode charactersASCIIGeneral Scripts
Symbols
CJK Ideographs
Hangul
Compatibility
Surrogate Area
65,000 characters
Additional 1,000,000 characters
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 6
E3 91 B979 3434 79U+3479
CE B1B1 0303 B1U+03B1C3 A4E4 0000 E4U+00E46161 0000 61U+0061a
UTF-8UTF-16little endian
UTF-16big endian
Unicodescalar value
Character
Representation of Unicode Characters
UTF-16 Unicode Transformation Format, 16 bit encoding Fixed length, 1 character = 2 bytes (surrogate pairs = 2 + 2 bytes) Platform-dependent byte order (big/little endian) 2 byte alignment restriction
UTF-8 Unicode Transformation Format, 8 bit encoding Variable length, 1 character = 1...4 bytes Platform independent no alignment restriction 7 bit US ASCII compatible
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 7
Communication: The Ideal Picture
The ideal Picture: only Unicode components
R/3 Enterprise
3rd Party
mySAP BW
R/3 Enterprise
FilesInternet
Conversions are done algorythmically (1:1 relation)
No data misinterpretation
No data loss All business relevant
characters available at the same time
...
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 8
Communication: Reality
R/3 4.6C
3rd PartyEBCDIC
mySAP BWISO8859-1
R/3 Enterprise
BIG-5SJIS
IS0-8IS0-1 1251
IS0-9
IS0-2IS0-3IS0-7
697/0277
697/0500
1252
Files
ISO8859-1SJIS
...charset=iso-8859-1" >...charset=windows-1257" >
...charset=utf-8" >...charset=Shift_JIS" >
Internet
The reality: Unicode and non-Unicode components
Conversions between incompatible code pages everywhere
Only common subset exchangeable
Special rules have to be obeyed to make communication possible
...
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 9
Contents
Introduction About Code Pages The Ideal Picture Reality
Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system
Part II File transfer Writing and reading files on the application server Writing and reading files on the front end
Part III Common mistakes
Exercises
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 10
RFC Unicode Unicode
R/3 Enterprise R/3 Enterprise
In case of an Unicode Unicode combination RFC passes all character data without code page conversion or merely with adaption of theendianness.
UTF-16 big endian = SAP code page 4102 UTF-16 little endian = SAP code page 4103
Information about the destination is maintained in SM59 special options character width in target system
1 Byte = non-Unicode 2 Byte = Unicode
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 11
RFC Unicode non-Unicode single code page
R/3 4.6CISO8859-1
R/3 Enterprise
In case of an Unicode non-Unicode single code page combination, RFC passes all character data with code page conversion between Unicode and the old code page.
As Unicode is a true superset of any old standard codepage not all Unicode characters can be transfered to the non-Unicode system:
# # # #
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 12
RFC Unicode non-Unicode MDMP
R/3 4.6CISO8859-1
SJIS
R/3 Enterprise
In case of an Unicode non-Unicode MDMP combination RFC passes all character data with code page conversion between Unicode and the different old code pages.
Which of the MDMP code pages is choosen depends on the language:
DE DE JA JA
JA # JA #
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 13
RFC Unicode non-Unicode MDMP
Excursion: Difference between flat and deep data types
Flat: C, N, D, T, X, I, F, P and any structure consisting only of these fields
Deep: STRING, XSTRING, table types, object references and any structure containing one of these types
Deep data types are transferred using an UTF-8 encoded XML format (XRFC).
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 14
RFC Unicode non-Unicode MDMP
Excursion: Difference between flat and deep data types
Detailed conversion paths:
Deep data: Unicode XML UTF-8 target code pageFlat data: Unicode target code page
Deep data: Unicode XML UTF-8 source code pageFlat data: Unicode source code page
Unicode system Non-Unicode system
non-Unicode compatible source code page
non-Unicode compatible target code page
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 15
RFC Unicode non-Unicode MDMP
Deriving code pages a) : Data without language key
Example: Flat data, logon language German
Logon = DE Logon = DE #
Source system
Data type
Source code page
Intermediate format *
Target code page
Unicode Flat
Unicode
Logon language source system * * Logon language
target systemDeep UTF-8 based XML
non-Unicode
Flat Logon language source systemLogon language source system
UnicodeDeep UTF-8 based XML
* XML / non-Uniocde compatible code page* * You may switch to Logon language target system using RFC bit option 0x200 at SM59 Special options RFC Bit Options
SY-LANGU source system
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 16
RFC Unicode non-Unicode MDMP
Deriving code pages b) : Data (flat) with language key
Flat Structures containing a language key (domain SPRAS, DDIC data type LANG) and maintained text language flag have a special handling:
Automatic language code page assignment is done during RFC for each row independent of logon language.
This enables sending and and receiving tables from MDMP systems (different code pages for each row):
Logon = DE / Lang key = DE Logon = DE / Lang key = JA
Maintain language codepage assignment with SM59 Maintain text language flag with SE11
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 17
Maintain RFC destination SM59: MDMP settings
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 18
SE11: Maintain text language
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 19
Contents
Introduction About Code Pages The Ideal Picture Reality
Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system
Part II File transfer Writing and reading files on the application server Writing and reading files on the front end
Part III Common mistakes
Exercises
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 20
File transfer: Application server
Pattern for writing/reading files on the application server:
OPEN DATASET IN MODETRANSFER/READCLOSE DATASET
:
BINARY MODEUninterpreted sequence of bytes.
TEXT MODE ENCODING UTF-8 / NON-UNICODE / DEFAULTPure unstructured text data. DEFAULT equals UTF-8 in Unicodesystems and NON-UNICODE in non-Unicode systems.
LEGACY TEXT/BINARY MODEProduces an format compatible to non-Unicode systems. Text data is always written in NON-UNICODE format. Not character-like structures are allowed. The only difference between TEXT and BINARY is, that in case of TEXT an EOF (END OF FILE) marker is added.
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 21
File transfer: Application server
Code page selection NON-UNICODE:
If during data transfer a Unicode non-Unicode conversion is neccessary, the non-Unicode code page is derived from the currentsystem language SY-LANGU, which may be changed by using SET LOCALE LANGUAGE .
Advantages and disadvantages for data exchange: BINARY. Not a good exchange format in itself. Use this for
writing/reading prepared data of well known format (e.g. XML /UTF-8 as XSTRING) or use for write/read on the same application server.
TEXT MODE: UTF-8 is a good exchange format. Structures may not be transfered as a whole. Only single fields.
LEGACY MODES: Only for reading or writing non-Unicode data. Structure and code page information is considered.
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 22
File transfer: Application server
Example 1: BINARY MODE
R/3 Enterprise R/3ISO8859-1
SJIS11008000
BINARY MODE
BINARY MODELEGACY BINARY MODE
SY-LANGU
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 23
File transfer: Application server
Example 2: TEXT MODE UTF-8
R/3 Enterprise R/3ISO8859-1
SJISTEXT MODE UTF-8 TEXT MODE UTF-8
SY-LANGU
TEXT MODE UTF-8 TEXT MODE UTF-8
SY-LANGU
Full charset supported (no data loss in the file) Structured data as a whole write field by field =
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 24
File transfer: Application server
Example 3: TEXT MODE NON-UNICODE
R/3 Enterprise R/3ISO8859-1
SJIS
SY-LANGU
TEXT MODE NON-UNICODE
SY-LANGU
TEXT MODE NON-UNICODE
TEXT MODE NON-UNICODE
TEXT MODE NON-UNICODE1100
8000
1100
8000
Full charset supported (no data loss in the file) Structured data as a whole write field by field =
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 25
File transfer: Application server
Example 4: TEXT MODE DEFAULT
R/3 Enterprise R/3ISO8859-1
SJIS
SY-LANGU
TEXT MODE DEFAULT
SY-LANGU
1100
8000
TEXT MODE NON-UNICODE
TEXT MODE DEFAULT
TEXT MODE UTF-8
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 26
File transfer: Application server
Example 5: LEGACY TEXT/BINARY MODE
R/3 Enterprise R/3ISO8859-1
SJIS
SY-LANGU
LEGACY TEXT/BINARY MODE
SY-LANGU
1100
8000
1100
8000
LEGACY TEXT/BINARY MODE
LEGACY TEXT/BINARY MODE
LEGACY TEXT/BINARY MODE
Full charset supported (no data loss in the file) Structured data
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 27
File transfer: Using XML
Using XML as transport format
Use CALL TRANSFORMATION with target data type XSTRING to create an UTF-8 based XML representation of your data.
Structure information(no layout / alignment problems)
UTF-8 based (no data loss)
Transport in binaryform
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 28
File transfer: Application server
Example 6: UTF-8 based XML + BINARY MODE
R/3 Enterprise R/3ISO8859-1
SJIS
SY-LANGU
CALL TRANSFORMATION+ BINARY MODE
BINARY MODE +CALL TRANSFORMATION
CALL TRANSFORMATION+ BINARY MODE
SY-LANGU
BINARY MODE +CALL TRANSFORMATION
Full charset supported (no data loss in the file) Structured data
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 29
File transfer: Frontend
File transfer at the frontend with GUI_UP/DOWNLOAD
The function modules GUI_/UPDOWNLOAD convert data into textual representation. Structures are allowed.
Determination of the outside code page:
Front end code page matching to the current system code page (SY-LANGU, SET LOCALE LANGUAGE)
Declared explicitly with optional parameter CODEPAGE (Starting with release 6.20 SP 21).
It is planned to provide in cl_gui_frontend_services=>file_open/save_dialogthe possibility to select from different frontend code pages (e.g. in the Unicode system you may select old standard code pages rather than using the standard frontend cp UTF-8 or later UTF-16).
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 30
Overview: RFC and File transfer
RFC and file transfer from a Unicode systems perspective
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 31
Contents
Introduction About Code Pages The Ideal Picture Reality
Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system
Part II File transfer Writing and reading files on the application server Writing and reading files on the front end
Part III Common mistakes
Exercises
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 32
Common mistakes: overview
Things you should never do!
Type hiding Missing language key Wrong length assumptions Sending data that is not in the receivers codepage ...
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 33
Common mistakes: Type hiding: binary data
Don't hide types 1If you conceal the true types from the system the system cannot anything for you. As a consequence, data may, for example, be subject to unwanted codepage conversions.
Example: Transporting binary data in character containers
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 34
Common mistakes: Type hiding: characterlike data
Don't hide types 2
Even sending a pure characterlike structure in a character container conceals important information the field boundaries from the system.
Example: Transporting characterlike data in character containers
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 35
0 0 F F 0 0
Common mistakes: Type hiding: characterlike data
Workaround if container approach cannot be changed
Use CL_NLS_STRUC_CONTAINER to correct the implicit layout:
NAME RGB Value
0 0 F F 0 0
0 0 F F 0 0
0 0 F F 0 0Unicode system
Non-Unicode system
RFC
struc_to_cont
cont_to_strucstruc_to_cont
cont_to_struc
Data container
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 36
Common mistakes: Missing language key
Always use language keysIn principle you must not send any data without language key if the data contains non 7 bit ASCII characters. Otherwise corruption of the data is the result.
Example: Sending non Latin 1 data without language key by RFC with German logon
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 37
Common mistakes: Wrong length assumptions
Problems with length assumptionsString lengths are not invariant under code page conversions. This may leadto different problems:
In a Unicode system a character field of certain length can hold more characters than the same character field in a non-Unicode system. Sending such data will result in data loss ().
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 38
Common mistakes: Wrong length assumptions
Problems with length assumptions (continued)
Breaking a string into a table of fixed line size and sending the table from a non-Unicode to a Unicode-system does not work, since the information about the occupied length is lost and subsequent reassembling into a string will insert unwanted spaces ().
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 39
Common mistakes: data not in receivers codepage
Data not in the receivers code page
In general you must not send data from a source system into a targetsystem, if the characters send are not in the target systems code page. Especially dont send one of the characters that are only in the Unicode code page to an old-fashioned non-Unicode system:
Try to send a white smiling face () or a black smiling face () or some beamed eigth notes () ! ( # )
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 40
Contents
Introduction About Code Pages The Ideal Picture Reality
Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system
Part II File transfer Writing and reading files on the application server Writing and reading files on the front end
Part III Common mistakes
Exercises
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 41
Exercises
Send single code page and MDMP data via RFC Type hiding and missing language keys:
TECHED_UNICODE_EXERCISE_11/12/13/14 and15 Wrong length assumptions:
TECHED_UNICODE_EXERCISE_16/18 Data not in the receivers code page:
TECHED_UNICODE_EXERCISE_17
Transfer data via file on the application server Writing files:
TECHED_UNICODE_EXERCISE_19 Reading files:
TECHED_UNICODE_EXERCISE_20
Transfer data via file on the frontend Writing files:
TECHED_UNICODE_EXERCISE_21 Reading files:
TECHED_UNICODE_EXERCISE_22
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 42
Service Marketplace:Technical information: http://service.sap.com/Unicode@SAPCustomer contact: mail [email protected]
Further Information
Further Presentationshttp://service.sap.com/Unicode@SAP Unicode Technology Media Library:z Unicode Enabling ABAP Programs or
ABAP Conversion SAP Tutorz Unicode Support in SAP Web Application Server
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 43
Q&A
Questions?
-
2003 SAP AG, Unicode Outside Communication, Christian Hansen 44
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, WINDOWS, NT, EXCEL, Word, PowerPoint and SQL Server are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, Informix and Informix Dynamic ServerTM are trademarks of IBM Corporation in USA and/or other countries.
ORACLE is a registered trademark of ORACLE Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, the Citrix logo, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, MultiWin and
other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.
HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts Institute of Technology.
JAVA is a registered trademark of Sun Microsystems, Inc. JAVASCRIPT is a registered trademark of Sun Microsystems, Inc., used under license for technology invented
and implemented by Netscape.
MarketSet and Enterprise Buyer are jointly owned trademarks of SAP AG and Commerce One. SAP, SAP Logo, R/2, R/3, mySAP, mySAP.com and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are trademarks of their respective companies.
Copyright 2003 SAP AG. All Rights Reserved