Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun...
-
Upload
eric-bryant -
Category
Documents
-
view
235 -
download
0
Transcript of Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun...
Siebel CRM Unicode Conversion – The DBA
PerspectiveBrian Hitchcock
OCP 8, 8i, 9i DBA
Sun Microsystems
[email protected] Technical Services DBA
Brian Hitchcock September 15, 2004 Page 1
www.brianhitchcock.net
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 2
www.brianhitchcock.net
CRM Unicode Conversion
Three separate presentations– 1) The overall conversion process
What we had, what we wanted, how to get there Issues that come up during conversion
– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem
– 3) The actual conversion What really happened Issues that came up and how they were resolved
Focus on DBA issues, not Siebel application
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 3
www.brianhitchcock.net
How Did I Get Involved?
Sleeping in a meeting… Heard someone say
– “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped”
Woke up, said– “I’ve done that before…”– See “Case of the Missing Kanji”
Don’t wake up in meetings…
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 4
www.brianhitchcock.net
What’s The Issue?
Existing Siebel CRM system– Oracle 8.1.7.4– Single-byte character set (WE8ISO8859P1)
Interface systems– Multi-byte character set(s) (UTF8)– Handle data between single,multi-byte apps
Want to convert to Unicode– Siebel, database, interfaces all should be UTF8– Eliminate interface systems
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 5
www.brianhitchcock.net
What we had
Siebel CRM
Oracle Db
Custdb Apac
Users
Tcustdb Apac
Custdb Emea
Custdb Amer
Tcustdb Emea
Amer
Emea
Apac UTF8
WE8ISO8859P1
UTF8
UTF8
UTF8
WE8ISO8859P1
8859P1
8859P1
Ordering System
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 6
www.brianhitchcock.net
What we wanted
Siebel CRM
Oracle Db
Custdb Apac
Users
Custdb Emea
Custdb Amer
Amer
Emea
Apac
WE8ISO8859P1
UTF8
UTF8
AL32UTF8
UTF8
UTF8
Ordering System
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 7
www.brianhitchcock.net
What We Wanted
All data in one database– All languages– Unicode
Eliminate interface systems– Reduce support costs
Support increased CRM functionality– All data in one place– Supports new business functionality
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 8
www.brianhitchcock.net
Would you like fries with that?
Unicode conversion includes– Oracle db
Convert to AL32UTF8 character set
Required by Siebel for Unicode Upgrade to 9.2.0.4
Required to get AL32UTF8 character set
– Remove Tcustdb databases Modify triggers that link source db to Tcustdb
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 9
www.brianhitchcock.net
And A Shake?
And, while you’re at it…– Application GUI
Retrieve different data, multi-byte, local language
– Clients Upgrade to Oracle 9.2.0.4 (SQL*Plus)
Lots of changes all at once– Testing– How to know impact of each change?
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 10
www.brianhitchcock.net
Converting to Unicode
It’s easy – right?– Siebel CRM
make some configuration changes
– Oracle database Export from single-byte database Import into new db created with UTF8 char set
– Testing– Done
This is the ‘management’ view
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 11
www.brianhitchcock.net
What Is Unicode?
International standard Collection of characters
– Covers most of the world’s languages Chinese poetry?
– All characters have unique byte-code
Application developers– Support Unicode– No need to worry about specific languages
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 12
www.brianhitchcock.net
You Make This Stuff Up!
What follows can be found in – Oracle9i Database Globalization Support Guide– Release 2 (9.2)– Part Number A96529-01
Or, you can trust me… Character sets, Unicode
– Consist of set of characters– Encoding of the characters to byte-codes
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 13
www.brianhitchcock.net
Single Byte Encoding Schemes
7-bit encoding schemes– Single-byte 7-bit up to 128 characters – normally support just one language– US7ASCII
8-bit encoding schemes– Single-byte 8-bit up to 256 characters– often support a group of related languages– WE8ISO8859P1
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 14
www.brianhitchcock.net
8859P1 Character set
Oracle Character Set WE8ISO8859P1 Hex 0x41 is A
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 15
www.brianhitchcock.net
Multi-byte Encoding Schemes
Fixed-width– each character occupies a fixed number of bytes– Faster text processing– AL16UTF8
Variable-width– one or more bytes to represent a single character– Saves disk space (typically lots of disk space)– UTF8, AL32UTF8
Shift-sensitive variable-width– use control codes to differentiate single-byte multi-byte
characters with the same code values
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 16
www.brianhitchcock.net
UTF8 Byte Storage
Different characters occupy 1, 2, 3 or 4 bytes
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 17
www.brianhitchcock.net
AL32UTF8
UTF8– Supports Unicode 3.0 since 8.1.7.4– Up to 3 bytes per character– Supplemental characters
Pairs of 3 byte character codes
AL32UTF8– Supports Unicode 3.1 (latest version?), since 9i– Up to 4 bytes per character
Supplemental characters
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 18
www.brianhitchcock.net
Confused?
Unicode, a set of characters Character set, encoded set of characters Encoding scheme, UTF-8, ISO standard for
variable width encoding of Unicode character set
UTF8, Oracle implementation of UTF-8 If you’re not confused, you aren’t paying
attention!
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 19
www.brianhitchcock.net
Changing Character Set
You can simply alter the database (right?) Only works if
– new character set is strict superset of existing character set
– For all characters in existing character set All exist in new character set All have exact same code in new character set
Example– WE8MSWIN1252 (superset, includes euro)– WE8ISO8859P (subset)
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 20
www.brianhitchcock.net
Complexities
Even for the same character– Different encoding in different character set
Example– Latin (Western European) character á– E1 in WE8ISO8859P1– C391 in UTF8
If existing character not in new char set– ? (replacement character) displayed
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 21
www.brianhitchcock.net
Cure
Create new database– Using new character set
Extract data from old database Insert data into new database Export/import is most often used
– Could use other methods Extract data to flat files SQL*Loader
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 22
www.brianhitchcock.net
Database Conversion
Serial– Upgrade source, export, drop schemas, import
Parallel– Create target– Export source– Import to target
Chose Parallel– Source still available after target in use
User tablespace issue for example
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 23
www.brianhitchcock.net
Impact of Unicode
Table columns must be widened Existing column
– Holds up to 20 Latin characters– WE8ISO8859P1, each Latin character 1 byte– VARCHAR2(20)
New column– UTF8– Each Latin character occupies 2 bytes– Need VARCHAR2(40)
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 24
www.brianhitchcock.net
Impact of Unicode
Worst case– UTF8 can have up to 4 bytes per character– For all existing character columns– Need to expand by 4x
Disk space– CHAR – 4x disk space– VARCHAR2 – 1x to 4x
Depends on specific characters inserted
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 25
www.brianhitchcock.net
Impact of Unicode
Tables– Columns must be wider– Each character can be up to 4 bytes
Triggers, PL/SQL code– Modify to handle multi-byte data
End-user front-end (browser)– Reconfigure to
Display multi-byte data, accept multi-byte data
All app components must handle Unicode
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 26
www.brianhitchcock.net
User Impact
VARCHAR2, AL32UTF8– 4000 byte limit
How many characters can I enter?– Latin, 2000– Japanese, 4000/3
If moving from Japanese character set 2 bytes per character Max characters reduced by 1/3
– Supplemental characters, 1000 Characters like ‘treble clef’
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 27
www.brianhitchcock.net
Disk Space
How much multi-byte data do you have?– We found all of ours– Typically, 5-10%– See 2) Multi-byte data in the existing CRM db
Compute disk space requirement– If you have 5% multi-byte character data– Need maximum of 20% more disk space
Will you add more multi-byte data?– Once you have converted to Unicode…
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 28
www.brianhitchcock.net
Expanding Columns
Need to expand lots of columns– Individual SQL statements– Lots of SQL to generate
How to make Oracle do this for us?– Export existing database– New database has init.ora parameter
NLS_LENGTH_SEMANTICS = CHAR– Import into new database
All character columns widened as tables created VARCHAR(10) becomes VARCHAR(40)
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 29
www.brianhitchcock.net
Character Semantics – 9i
Change column data types– VARCHAR2(10 byte)– VARCAHR2(10 char)– Requires SQL statement for each column
NLS_LENGTH_SEMANTICS– Init.ora parameter– What happens if init.ora changed?– BYTE or CHAR– All character columns created with byte or char– Handles PL/SQL code as well
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 30
www.brianhitchcock.net
The Siebel Process
Create target database Shutdown app Upgrade Oracle client Source db character set Run migrate.sh script Full export source Import to target db Modify target db
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 31
www.brianhitchcock.net
Create target database
Oracle 9.2.0.4 Character set AL32UTF8 Character semantics CHAR Tablespace names same as source db
– 15% more space than source db
Locally managed, uniform 130k Auto UNDO, tablespace
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 32
www.brianhitchcock.net
Shutdown app
Shutdown various app servers Shutdown source db Cold backup Upgrade source db to 9.2.0.4
– Migrate 8.1.7.4 to 9.2.0.4
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 33
www.brianhitchcock.net
Upgrade Oracle client
Upgrade Oracle client software to 9.2.0.4– For all machines that have SQL*Plus– Upgrade to 9.2.0.4– Install 9.2.0.4
Client install only
– Tar up 9.2.0.4 client ORACLE_HOME– ftp, untar on machines that need SQL*Plus
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 34
www.brianhitchcock.net
Source db character set
Fix any user tablespace issues– Import won’t fix them for you
Change source db character set– WE8MSWIN1252
Siebel requirement Contains euro symbol Is a strict superset of WE8ISO8859P1
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 35
www.brianhitchcock.net
Run migrate.sh script
Siebel supplied script– Generates various scripts
Expand.ksh
Widen columns for Unicode Impexp06.ksh
Import individual tables for large dbs
We use full export/import instead
Run sun_expand.sql– Widen columns in tables outside Siebel schemas
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 36
www.brianhitchcock.net
Export Source, Import Target
Full export of source db– Source db is now 9.2.0.4
NLS_LANG
AMERICAN_AMERICA.AL32UTF8
Import into target db– Target db created as 9.2.0.4
NLS_LANG
AMERICAN_AMERICA.AL32UTF8
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 37
www.brianhitchcock.net
The conversion setup
Source Db
Target Db
export
import
Source Db
WE8ISO8859P1
WE8MSWIN1252
WE8MSWIN1252
WE8MSWIN1252
AL32UTF8
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 38
www.brianhitchcock.net
Modify target db
Run impexp06.ksh– Handles sequences etc.
Run check_schema.sql– Find columns that didn’t get widened
Various changes on Siebel App side Verify db links to Custdb databases
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 39
www.brianhitchcock.net
Conversion Complete?
Siebel process is done Fix any data issues
– Multi-byte character data in source db– Convert properly to AL32UTF8
Testing Unicode changes– GUI changes– Performance
Unicode processing Users accessing different data
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 40
www.brianhitchcock.net
Multi-byte Data In Source Db?
Source db is WE8ISO8859P1– Single-byte character set– Doesn’t support multi-byte characters
That’s the official story The reality is somewhat different
What, if any multi-byte data is in source db?– How to determine correct character set?– How to find, how to fix?– Japanese, Russian, others?
DCSIT Technical Services DBA
Brian Hitchcock September 15, 2004 Page 41
www.brianhitchcock.net
CRM Unicode Conversion
Three separate presentations– 1) The overall conversion process
What we had, what we wanted, how to get there Issues that come up during conversion
– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem
– 3) The actual conversion What really happened Issues that came up and how they were resolved
Focus on DBA issues, not Siebel application