Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun...

41
Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems [email protected] [email protected] DCSIT Technical Services DBA Brian Hitchcock September 15, 2004 Page 1 www.brianhitchcoc k.net

Transcript of Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun...

Page 1: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

Siebel CRM Unicode Conversion – The DBA

PerspectiveBrian Hitchcock

OCP 8, 8i, 9i DBA

Sun Microsystems

[email protected]

[email protected] Technical Services DBA

Brian Hitchcock September 15, 2004 Page 1

www.brianhitchcock.net

Page 2: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 2

www.brianhitchcock.net

CRM Unicode Conversion

Three separate presentations– 1) The overall conversion process

What we had, what we wanted, how to get there Issues that come up during conversion

– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem

– 3) The actual conversion What really happened Issues that came up and how they were resolved

Focus on DBA issues, not Siebel application

Page 3: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 3

www.brianhitchcock.net

How Did I Get Involved?

Sleeping in a meeting… Heard someone say

– “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped”

Woke up, said– “I’ve done that before…”– See “Case of the Missing Kanji”

Don’t wake up in meetings…

Page 4: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 4

www.brianhitchcock.net

What’s The Issue?

Existing Siebel CRM system– Oracle 8.1.7.4– Single-byte character set (WE8ISO8859P1)

Interface systems– Multi-byte character set(s) (UTF8)– Handle data between single,multi-byte apps

Want to convert to Unicode– Siebel, database, interfaces all should be UTF8– Eliminate interface systems

Page 5: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 5

www.brianhitchcock.net

What we had

Siebel CRM

Oracle Db

Custdb Apac

Users

Tcustdb Apac

Custdb Emea

Custdb Amer

Tcustdb Emea

Amer

Emea

Apac UTF8

WE8ISO8859P1

UTF8

UTF8

UTF8

WE8ISO8859P1

8859P1

8859P1

Ordering System

Page 6: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 6

www.brianhitchcock.net

What we wanted

Siebel CRM

Oracle Db

Custdb Apac

Users

Custdb Emea

Custdb Amer

Amer

Emea

Apac

WE8ISO8859P1

UTF8

UTF8

AL32UTF8

UTF8

UTF8

Ordering System

Page 7: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 7

www.brianhitchcock.net

What We Wanted

All data in one database– All languages– Unicode

Eliminate interface systems– Reduce support costs

Support increased CRM functionality– All data in one place– Supports new business functionality

Page 8: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 8

www.brianhitchcock.net

Would you like fries with that?

Unicode conversion includes– Oracle db

Convert to AL32UTF8 character set

Required by Siebel for Unicode Upgrade to 9.2.0.4

Required to get AL32UTF8 character set

– Remove Tcustdb databases Modify triggers that link source db to Tcustdb

Page 9: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 9

www.brianhitchcock.net

And A Shake?

And, while you’re at it…– Application GUI

Retrieve different data, multi-byte, local language

– Clients Upgrade to Oracle 9.2.0.4 (SQL*Plus)

Lots of changes all at once– Testing– How to know impact of each change?

Page 10: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 10

www.brianhitchcock.net

Converting to Unicode

It’s easy – right?– Siebel CRM

make some configuration changes

– Oracle database Export from single-byte database Import into new db created with UTF8 char set

– Testing– Done

This is the ‘management’ view

Page 11: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 11

www.brianhitchcock.net

What Is Unicode?

International standard Collection of characters

– Covers most of the world’s languages Chinese poetry?

– All characters have unique byte-code

Application developers– Support Unicode– No need to worry about specific languages

Page 12: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 12

www.brianhitchcock.net

You Make This Stuff Up!

What follows can be found in – Oracle9i Database Globalization Support Guide– Release 2 (9.2)– Part Number A96529-01

Or, you can trust me… Character sets, Unicode

– Consist of set of characters– Encoding of the characters to byte-codes

Page 13: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 13

www.brianhitchcock.net

Single Byte Encoding Schemes

7-bit encoding schemes– Single-byte 7-bit up to 128 characters – normally support just one language– US7ASCII

8-bit encoding schemes– Single-byte 8-bit up to 256 characters– often support a group of related languages– WE8ISO8859P1

Page 14: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 14

www.brianhitchcock.net

8859P1 Character set

Oracle Character Set WE8ISO8859P1 Hex 0x41 is A

Page 15: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 15

www.brianhitchcock.net

Multi-byte Encoding Schemes

Fixed-width– each character occupies a fixed number of bytes– Faster text processing– AL16UTF8

Variable-width– one or more bytes to represent a single character– Saves disk space (typically lots of disk space)– UTF8, AL32UTF8

Shift-sensitive variable-width– use control codes to differentiate single-byte multi-byte

characters with the same code values

Page 16: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 16

www.brianhitchcock.net

UTF8 Byte Storage

Different characters occupy 1, 2, 3 or 4 bytes

Page 17: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 17

www.brianhitchcock.net

AL32UTF8

UTF8– Supports Unicode 3.0 since 8.1.7.4– Up to 3 bytes per character– Supplemental characters

Pairs of 3 byte character codes

AL32UTF8– Supports Unicode 3.1 (latest version?), since 9i– Up to 4 bytes per character

Supplemental characters

Page 18: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 18

www.brianhitchcock.net

Confused?

Unicode, a set of characters Character set, encoded set of characters Encoding scheme, UTF-8, ISO standard for

variable width encoding of Unicode character set

UTF8, Oracle implementation of UTF-8 If you’re not confused, you aren’t paying

attention!

Page 19: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 19

www.brianhitchcock.net

Changing Character Set

You can simply alter the database (right?) Only works if

– new character set is strict superset of existing character set

– For all characters in existing character set All exist in new character set All have exact same code in new character set

Example– WE8MSWIN1252 (superset, includes euro)– WE8ISO8859P (subset)

Page 20: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 20

www.brianhitchcock.net

Complexities

Even for the same character– Different encoding in different character set

Example– Latin (Western European) character á– E1 in WE8ISO8859P1– C391 in UTF8

If existing character not in new char set– ? (replacement character) displayed

Page 21: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 21

www.brianhitchcock.net

Cure

Create new database– Using new character set

Extract data from old database Insert data into new database Export/import is most often used

– Could use other methods Extract data to flat files SQL*Loader

Page 22: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 22

www.brianhitchcock.net

Database Conversion

Serial– Upgrade source, export, drop schemas, import

Parallel– Create target– Export source– Import to target

Chose Parallel– Source still available after target in use

User tablespace issue for example

Page 23: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 23

www.brianhitchcock.net

Impact of Unicode

Table columns must be widened Existing column

– Holds up to 20 Latin characters– WE8ISO8859P1, each Latin character 1 byte– VARCHAR2(20)

New column– UTF8– Each Latin character occupies 2 bytes– Need VARCHAR2(40)

Page 24: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 24

www.brianhitchcock.net

Impact of Unicode

Worst case– UTF8 can have up to 4 bytes per character– For all existing character columns– Need to expand by 4x

Disk space– CHAR – 4x disk space– VARCHAR2 – 1x to 4x

Depends on specific characters inserted

Page 25: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 25

www.brianhitchcock.net

Impact of Unicode

Tables– Columns must be wider– Each character can be up to 4 bytes

Triggers, PL/SQL code– Modify to handle multi-byte data

End-user front-end (browser)– Reconfigure to

Display multi-byte data, accept multi-byte data

All app components must handle Unicode

Page 26: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 26

www.brianhitchcock.net

User Impact

VARCHAR2, AL32UTF8– 4000 byte limit

How many characters can I enter?– Latin, 2000– Japanese, 4000/3

If moving from Japanese character set 2 bytes per character Max characters reduced by 1/3

– Supplemental characters, 1000 Characters like ‘treble clef’

Page 27: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 27

www.brianhitchcock.net

Disk Space

How much multi-byte data do you have?– We found all of ours– Typically, 5-10%– See 2) Multi-byte data in the existing CRM db

Compute disk space requirement– If you have 5% multi-byte character data– Need maximum of 20% more disk space

Will you add more multi-byte data?– Once you have converted to Unicode…

Page 28: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 28

www.brianhitchcock.net

Expanding Columns

Need to expand lots of columns– Individual SQL statements– Lots of SQL to generate

How to make Oracle do this for us?– Export existing database– New database has init.ora parameter

NLS_LENGTH_SEMANTICS = CHAR– Import into new database

All character columns widened as tables created VARCHAR(10) becomes VARCHAR(40)

Page 29: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 29

www.brianhitchcock.net

Character Semantics – 9i

Change column data types– VARCHAR2(10 byte)– VARCAHR2(10 char)– Requires SQL statement for each column

NLS_LENGTH_SEMANTICS– Init.ora parameter– What happens if init.ora changed?– BYTE or CHAR– All character columns created with byte or char– Handles PL/SQL code as well

Page 30: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 30

www.brianhitchcock.net

The Siebel Process

Create target database Shutdown app Upgrade Oracle client Source db character set Run migrate.sh script Full export source Import to target db Modify target db

Page 31: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 31

www.brianhitchcock.net

Create target database

Oracle 9.2.0.4 Character set AL32UTF8 Character semantics CHAR Tablespace names same as source db

– 15% more space than source db

Locally managed, uniform 130k Auto UNDO, tablespace

Page 32: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 32

www.brianhitchcock.net

Shutdown app

Shutdown various app servers Shutdown source db Cold backup Upgrade source db to 9.2.0.4

– Migrate 8.1.7.4 to 9.2.0.4

Page 33: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 33

www.brianhitchcock.net

Upgrade Oracle client

Upgrade Oracle client software to 9.2.0.4– For all machines that have SQL*Plus– Upgrade to 9.2.0.4– Install 9.2.0.4

Client install only

– Tar up 9.2.0.4 client ORACLE_HOME– ftp, untar on machines that need SQL*Plus

Page 34: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 34

www.brianhitchcock.net

Source db character set

Fix any user tablespace issues– Import won’t fix them for you

Change source db character set– WE8MSWIN1252

Siebel requirement Contains euro symbol Is a strict superset of WE8ISO8859P1

Page 35: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 35

www.brianhitchcock.net

Run migrate.sh script

Siebel supplied script– Generates various scripts

Expand.ksh

Widen columns for Unicode Impexp06.ksh

Import individual tables for large dbs

We use full export/import instead

Run sun_expand.sql– Widen columns in tables outside Siebel schemas

Page 36: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 36

www.brianhitchcock.net

Export Source, Import Target

Full export of source db– Source db is now 9.2.0.4

NLS_LANG

AMERICAN_AMERICA.AL32UTF8

Import into target db– Target db created as 9.2.0.4

NLS_LANG

AMERICAN_AMERICA.AL32UTF8

Page 37: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 37

www.brianhitchcock.net

The conversion setup

Source Db

Target Db

export

import

Source Db

WE8ISO8859P1

WE8MSWIN1252

WE8MSWIN1252

WE8MSWIN1252

AL32UTF8

Page 38: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 38

www.brianhitchcock.net

Modify target db

Run impexp06.ksh– Handles sequences etc.

Run check_schema.sql– Find columns that didn’t get widened

Various changes on Siebel App side Verify db links to Custdb databases

Page 39: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 39

www.brianhitchcock.net

Conversion Complete?

Siebel process is done Fix any data issues

– Multi-byte character data in source db– Convert properly to AL32UTF8

Testing Unicode changes– GUI changes– Performance

Unicode processing Users accessing different data

Page 40: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 40

www.brianhitchcock.net

Multi-byte Data In Source Db?

Source db is WE8ISO8859P1– Single-byte character set– Doesn’t support multi-byte characters

That’s the official story The reality is somewhat different

What, if any multi-byte data is in source db?– How to determine correct character set?– How to find, how to fix?– Japanese, Russian, others?

Page 41: Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical.

DCSIT Technical Services DBA

Brian Hitchcock September 15, 2004 Page 41

www.brianhitchcock.net

CRM Unicode Conversion

Three separate presentations– 1) The overall conversion process

What we had, what we wanted, how to get there Issues that come up during conversion

– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem

– 3) The actual conversion What really happened Issues that came up and how they were resolved

Focus on DBA issues, not Siebel application