Oracle-9i for DataWarehousing

36
Features for Data Features for Data Warehousing Warehousing Presented by Bill Barrow Presented by Bill Barrow O O - - DB DB Administration Consulting Ltd. Administration Consulting Ltd. www.odbac.com www.odbac.com September 22 September 22 nd nd 2004 2004 ECOUG Breakfast Meeting ECOUG Breakfast Meeting

Transcript of Oracle-9i for DataWarehousing

Page 1: Oracle-9i for DataWarehousing

Features for Data Features for Data WarehousingWarehousing

Presented by Bill BarrowPresented by Bill BarrowOO--DBDB Administration Consulting Ltd.Administration Consulting Ltd.

www.odbac.comwww.odbac.comSeptember 22September 22ndnd 2004 2004 –– ECOUG Breakfast MeetingECOUG Breakfast Meeting

Page 2: Oracle-9i for DataWarehousing

Data Warehousing ChallengesData Warehousing Challenges

Understanding User RequirementsUnderstanding User RequirementsFinding the data Finding the data –––– Internal data sourcesInternal data sources–– External Data sourcesExternal Data sources

Cleaning the DataCleaning the DataLoading and TransformingLoading and TransformingQueryingQueryingManaging the WarehouseManaging the Warehouse

Page 3: Oracle-9i for DataWarehousing

Understanding User RequirementsUnderstanding User RequirementsWhat questions do they want answered from the system?What questions do they want answered from the system?Who is sponsoring the warehouse Who is sponsoring the warehouse –– top management (a top management (a warehouse) or a department head (a data mart)?warehouse) or a department head (a data mart)?Do they know what a data warehouse is?Do they know what a data warehouse is?Do they really need one?Do they really need one?What budget has been allocated or will be allocated once What budget has been allocated or will be allocated once requirements are fully documented?requirements are fully documented?SignSign--off on warehouse design, budget, goals etc.off on warehouse design, budget, goals etc.

•• Flexibility costs $$$. The most flexible grain of storage in a Flexibility costs $$$. The most flexible grain of storage in a sales oriented data warehouse is the individual line item of sales oriented data warehouse is the individual line item of an invoice. an invoice.

•• Once the lowest business grain is built into the warehouse, Once the lowest business grain is built into the warehouse, practically any question can be answered now and in the practically any question can be answered now and in the future. future.

•• This grain costs big $$$ in storage, processing power and This grain costs big $$$ in storage, processing power and system management.system management.

Page 4: Oracle-9i for DataWarehousing

Finding Source DataFinding Source Data

1.1. Internal sources such as sales, Internal sources such as sales, shipping, inventory and customer shipping, inventory and customer management systemsmanagement systems

2.2. External sources such as External sources such as competitive market analyses, competitive market analyses, weather and seasonal information, weather and seasonal information, world events etc.world events etc.

Page 5: Oracle-9i for DataWarehousing

Cleaning DataCleaning DataHow clean are the sources?How clean are the sources?What What format(sformat(s) are the source ) are the source data available in?data available in?−− DB or text? DB or text? −− ASCII or EBCDIC?ASCII or EBCDIC?−− Delimited or fixed length?Delimited or fixed length?What tools (if any) can cross What tools (if any) can cross reference data in multiple reference data in multiple systems? systems? Do you build or buy cleaning Do you build or buy cleaning tools?tools?

Page 6: Oracle-9i for DataWarehousing

Managing the WarehouseManaging the WarehouseData has to be loaded (Data has to be loaded (upsertsupserts or complete or complete refreshes)refreshes)Aggregates have to be build and maintainedAggregates have to be build and maintainedBackups must be performedBackups must be performedIndexes must be built and rebuiltIndexes must be built and rebuiltStatistics have to be gatheredStatistics have to be gatheredLet these things guide your hardware budget. Let these things guide your hardware budget.

−− Refrain from buying the big hardware first (buy Refrain from buying the big hardware first (buy smaller test beds against which you benchmark)smaller test beds against which you benchmark)

−− Tailor your hardware purchases to the needs of your Tailor your hardware purchases to the needs of your batch maintenance windows. batch maintenance windows. ParalleliseParallelise everything to everything to help fit into batch window times. help fit into batch window times.

−− Parallel everything requires Parallel everything requires BIGBIG hardware. Think 8 hardware. Think 8 processor SMP as a small system!processor SMP as a small system!

Page 7: Oracle-9i for DataWarehousing

Part I Part I -- Loading and TransformationLoading and Transformation

Part I of this presentation will focus on Part I of this presentation will focus on External Tables and Table functionsExternal Tables and Table functionsThese features are fully available in 9iR2 These features are fully available in 9iR2 (9.2.0.5 or later with security patches (9.2.0.5 or later with security patches applied)applied)This topic of discussion assumes that :This topic of discussion assumes that :

•• The warehouse design was approved and The warehouse design was approved and implementedimplemented

•• Data sources are available as text files Data sources are available as text files •• Data has already been cleaned (this is not Data has already been cleaned (this is not

part of the ETL process)part of the ETL process)

Page 8: Oracle-9i for DataWarehousing

Loading Data Loading Data –– External TablesExternal Tables

If SQL*Loader is a viable option for If SQL*Loader is a viable option for loading raw, untransformed data into loading raw, untransformed data into the database, them external tables the database, them external tables could be considered as a could be considered as a replacementreplacementThe External table facility The External table facility utilisesutilises a a SQL*Loader data cartridge or access SQL*Loader data cartridge or access driver (as part of the DB) to present driver (as part of the DB) to present external data as an Oracle table.external data as an Oracle table.

Page 9: Oracle-9i for DataWarehousing

Loading Data Loading Data –– External TablesExternal TablesExternal tables do not consume any External tables do not consume any storage within the database (except for storage within the database (except for minimal storage in the dictionary)minimal storage in the dictionary)They do not have any row ids, and cannot They do not have any row ids, and cannot be indexedbe indexedThey can be queried using parallel query They can be queried using parallel query Once created, they behave like normal Once created, they behave like normal tables except that no DML can be tables except that no DML can be performed against them (Read Only).performed against them (Read Only).You can create synonyms, views and You can create synonyms, views and grants etc.grants etc.

Page 10: Oracle-9i for DataWarehousing

Loading Data Loading Data –– External TablesExternal TablesCREATE TABLE INVOICE_DETAIL_ET ( -- The table definition as it is seen in Oracle

TRANSACTION_TYPE CHAR(2)

,ITEM_NO VARCHAR2(9) -- Numeric data converted to VARCHAR2

,QUANTITY NUMBER

,GROSS_PRICE NUMBER

,TOTAL_ALLOWANCE NUMBER

,INVOICE_KEY NUMBER(6,0)) -- Numeric data as a number

ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER -- Other types will be available or you can write your own

DEFAULT DIRECTORY ncs_upload -- Utilises a directory object in Oracle to locate the file.

ACCESS PARAMETERS (RECORDS FIXED 114 -- Two character more than the actual fixed record lengthLOGFILE 'ncs_invoice_detail.log‘ -- The name of the log file (same directory as data file)

DISCARDFILE 'ncs_invoice_detail.dsc‘ -- The name of the discard file

BADFILE 'ncs_invoice_detail.bad‘ -- The name of the bad records file.

FIELDS LTRIM ( -- Tells Oracle to LTRIM all fields

TRANSACTION_TYPE (3:4)

,ITEM_NO (5:13) CHAR RTRIM -- Tells Oracle to RTRIM this field

,QUANTITY (14:20)

,GROSS_PRICE (21:27) -- Could include decimal points and sign

,TOTAL_ALLOWANCE (46:53)

,INVOICE_KEY (102:107) UNSIGNED INTEGER EXTERNAL(6)))

LOCATION ('invoice_detail.tab')) -- The actual name of the data file

REJECT LIMIT UNLIMITED; -- Just keep on loading

Creating for fixed record length data:

Page 11: Oracle-9i for DataWarehousing

Loading Data Loading Data –– External TablesExternal Tables

Sample data for previously defined table:Sample data for previously defined table:

Note that the data ends at 112 characters, but the record Note that the data ends at 112 characters, but the record definition was for 114 characters. This is for the CR/LF which definition was for 114 characters. This is for the CR/LF which must be included in the record length definition. must be included in the record length definition. In UNIX, the record length would have been 113.In UNIX, the record length would have been 113.

1 2 3 4 5 6 7 8 9 10 11123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234IDSA1186 001200000069580 +0000000000000000 +000000000000 |000001<<2>> IDSA1401 005000000018900 +0000000000000000 +000000000000 |000001<<2>>IDSA1448 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA1451 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA1455 001200000069580 +0000000000000000 +000000000000 |000001<<2>>IDSA3528 002400000018920 +0000000000000000 +000000000000 |000001<<2>>IDSA3671 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5324 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5327 002400000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5328 001200000036250 +0000000000000000 +000000000000 |000001<<2>>

Page 12: Oracle-9i for DataWarehousing

Loading Data Loading Data –– External TablesExternal TablesCreating for delimited record data:

Page 13: Oracle-9i for DataWarehousing

Loading Data : External TablesLoading Data : External Tables

Advantages:Advantages:–– The Load process is “skipped”The Load process is “skipped”–– No need for temporary staging tables No need for temporary staging tables

and their management before and their management before transformation of raw data into transformation of raw data into dimensional structuresdimensional structures

–– Parallel Query (and lack of indexing) is a Parallel Query (and lack of indexing) is a natural fit for large amounts of data that natural fit for large amounts of data that must be sequentially processed during must be sequentially processed during transformation for warehouse refreshestransformation for warehouse refreshes

Page 14: Oracle-9i for DataWarehousing

Transforming Data Transforming Data –– Table FunctionsTable FunctionsTable functions are executed in SELECT Table functions are executed in SELECT statementsstatementsThey return a table of records matching They return a table of records matching the column design of the target Heap the column design of the target Heap tabletableThey They utiliseutilise pipes and parallelismpipes and parallelismThey are written in PL/SQL or any They are written in PL/SQL or any supported DB language (encapsulated)supported DB language (encapsulated)

Page 15: Oracle-9i for DataWarehousing

Calling a table function (Calling a table function (upsertupsert):):

Transforming Data Transforming Data –– Table Table FunctionsFunctions

ALTER SESSION ENABLE PARALLEL DML;MERGE /*+ PARALLEL (FACT,10) APPEND */ -- the insert will be done using 10 PQ processesINTO FACT F -- the query will be done using another 10 PQ procsUSING TABLE(tabfnc.fact_load(CURSOR(SELECT /*+ parallel(ext_tab,10) */ * FROM ext_tab))) eON f.key1 = e.key1

AND f.key2 = e.key2WHEN MATCHED THEN

UPDATE SETf.col1 = e.col1 + f.col1,f.col2 = e.col2 * 1 + (SELECT ... FROM ... WHERE ...)...

WHEN NOT MATCHED THENINSERT VALUES( e.col1,e.col2,e.col3...);

The above statement will spawn 20 PQO processes, more if any sorting or queries occurs within the table function!!! One user, 20+ processes.Or more simply:SELECT * FROM TABLE(tabfnc.fact_load(CURSOR(SELECT * FROM ext_tab WHERE ...)));

The result of the above statement looks like normal SQL*Plus output when querying any other table.

Page 16: Oracle-9i for DataWarehousing

The guts of a table functionThe guts of a table function–– The target table’s record type definitionThe target table’s record type definition–– A table object based on the record typeA table object based on the record type–– A reference cursor returning the record type A reference cursor returning the record type

defined in a package specifically for this defined in a package specifically for this purpose only.purpose only.

–– A function (or packaged function) that is A function (or packaged function) that is designed as a table function and returns the designed as a table function and returns the table object table object –– pipelined and in parallelpipelined and in parallel

–– PL/SQL code that transforms the raw data into PL/SQL code that transforms the raw data into the target table’s data within the function. the target table’s data within the function.

Transforming Data Transforming Data –– Table Table FunctionsFunctions

Page 17: Oracle-9i for DataWarehousing

Transforming Data Transforming Data –– Table FunctionsTable FunctionsCREATE OR REPLACE FUNCTION sales_fnc(ih ref_pkg.invoice_head_rc)

RETURN EST_TABLE -- A table object of type estrecord

PIPELINED

PARALLEL_ENABLE (PARTITION ih BY ANY)

Page 18: Oracle-9i for DataWarehousing

The Record Type CreationThe Record Type CreationCREATE OR REPLACE TYPE CREATE OR REPLACE TYPE ESTRecordESTRecord AS OBJECTAS OBJECT((INVOICE_NO NUMBER(8)INVOICE_NO NUMBER(8),ROUTE_ID NUMBER(3),ROUTE_ID NUMBER(3),OUTLET_ID NUMBER(5),OUTLET_ID NUMBER(5),ITEM_ID VARCHAR2(6),ITEM_ID VARCHAR2(6),UNIT_CODE VARCHAR2(3),UNIT_CODE VARCHAR2(3),UNIT_PRICE NUMBER(13,4),UNIT_PRICE NUMBER(13,4),UNIT_DISCOUNT NUMBER(13,4),UNIT_DISCOUNT NUMBER(13,4),UNIT_QUANTITY NUMBER(4),UNIT_QUANTITY NUMBER(4),TRANSACTION_TYPE VARCHAR2(3),TRANSACTION_TYPE VARCHAR2(3),PAY_METHOD_CODE VARCHAR2(2),PAY_METHOD_CODE VARCHAR2(2),AMOUNT_TENDERED NUMBER(13,4),AMOUNT_TENDERED NUMBER(13,4),INVOICE_TIME VARCHAR2(15),INVOICE_TIME VARCHAR2(15),INVOICE_DATE DATE);,INVOICE_DATE DATE);//CREATE OR REPLACE TYPE EST_TABLE CREATE OR REPLACE TYPE EST_TABLE AS TABLE OF ESTRECORD;AS TABLE OF ESTRECORD;

Page 19: Oracle-9i for DataWarehousing

Reference CursorReference CursorCREATE OR REPLACE PACKAGE REF_PKG ASCREATE OR REPLACE PACKAGE REF_PKG AS

TYPE INVOICE_HEAD_RC TYPE INVOICE_HEAD_RC

IS REF CURSOR IS REF CURSOR

RETURN INVOICE_HEAD_ET%ROWTYPE;RETURN INVOICE_HEAD_ET%ROWTYPE;

END;END;

//

Page 20: Oracle-9i for DataWarehousing

Load and Transform Load and Transform -- ConclusionConclusion

The combination of External Tables and Table The combination of External Tables and Table Functions provides a powerful platform for Functions provides a powerful platform for loading large amount of data into the Oracle loading large amount of data into the Oracle databasedatabaseParallel everything makes it possible to achieve Parallel everything makes it possible to achieve parallelism without complex UNIX scriptingparallelism without complex UNIX scriptingThis is a portable solution (across any Oracle This is a portable solution (across any Oracle supported platform).supported platform).The lack of native, flexible scripting shells on The lack of native, flexible scripting shells on some some OS’sOS’s can be circumvented by these 9i can be circumvented by these 9i facilities.facilities.Parallel processing on right sized machines will Parallel processing on right sized machines will help ensure the batch window is met and the help ensure the batch window is met and the warehouse usable.warehouse usable.

Page 21: Oracle-9i for DataWarehousing

Part II Part II -- Querying the WarehouseQuerying the Warehouse

This discussion assumes that a STAR This discussion assumes that a STAR SCHEMA has been implemented as SCHEMA has been implemented as the warehouse schemathe warehouse schemaOracle 9i’s star transformation Oracle 9i’s star transformation optimisationoptimisation is superior to all that is superior to all that came before.came before.To properly To properly utiliseutilise and benefit from and benefit from Star Transformation query plans, Star Transformation query plans, precise DB setup is a must. precise DB setup is a must.

Page 22: Oracle-9i for DataWarehousing

A Star SchemaA Star Schema

Sales ValueGross ProfitContributionTotal CostRaw Materials CostDates Key (FK) (IE)Route Key (FK) (IE)Product Key (FK) (IE)Customer Key (FK) (IE)Boxes Sold

Product Sales (Fact)

Dates KeyYearQtrMonthDayDate

Dates Dimension

Route KeyRegionCountryTerritoryRouteSub Area

Route Dimension

Product KeyGroupClassSub ClassBrandNameWeightManufactured By

Product Dimension

Customer KeyClassSub ClassTypeSub typeChainNameAddress

Customer Dimension

Page 23: Oracle-9i for DataWarehousing

A Typical Warehouse QueryA Typical Warehouse Query

Page 24: Oracle-9i for DataWarehousing

Query Plans Query Plans –– Non StarNon StarA nonA non--star transformation database star transformation database configuration will result in the following plan:configuration will result in the following plan:

Page 25: Oracle-9i for DataWarehousing

Query Plan Query Plan –– Star TransformationStar Transformation

Page 26: Oracle-9i for DataWarehousing

Another Typical Warehouse QueryAnother Typical Warehouse Query

Page 27: Oracle-9i for DataWarehousing

The Non Star Transformation The Non Star Transformation Query PlanQuery Plan

Page 28: Oracle-9i for DataWarehousing

The Star Transformation Query The Star Transformation Query PlanPlan

Page 29: Oracle-9i for DataWarehousing

Characteristics of the Star Characteristics of the Star Transformation PlanTransformation Plan

Page 30: Oracle-9i for DataWarehousing

Star Transformations Further Star Transformations Further OptimisedOptimised with Parallelismwith Parallelism

Page 31: Oracle-9i for DataWarehousing

Parallel Enabled PlanParallel Enabled PlanThe join order was changed when parallel degree on the The join order was changed when parallel degree on the fact was set to 2. fact was set to 2. (SELECT (SELECT /*+ parallel (product_sales,2)*//*+ parallel (product_sales,2)*/ YEAR, …)YEAR, …)

–– Time to return 5 years of data (2000 Time to return 5 years of data (2000 –– 2004, 5.7M rows 2004, 5.7M rows in the fact) is 27 seconds. in the fact) is 27 seconds.

–– Execution Cost is 14,425 Execution Cost is 14,425 When Set to 4:When Set to 4:–– Time to return 5 years of data (2000 Time to return 5 years of data (2000 –– 2004, 5.7M rows 2004, 5.7M rows

in the fact) is 24 in the fact) is 24 secssecs..–– Execution Cost is 7,234Execution Cost is 7,234

Increasing parallelism beyond 4 (to 8 and 16) met with no Increasing parallelism beyond 4 (to 8 and 16) met with no further performance gains.further performance gains.Indeed, Oracle’s automatic parallel query tuning Indeed, Oracle’s automatic parallel query tuning parameters allowed only up to 8 parallel query processes to parameters allowed only up to 8 parallel query processes to be created, ignoring any further request for increased be created, ignoring any further request for increased parallelism. parallelism.

Page 32: Oracle-9i for DataWarehousing

8i DB 8i DB ConfigConfig for Star for Star TransformationTransformation

For Oracle 8iFor Oracle 8i1.1. always_anti_joinalways_anti_join = hash= hash2.2. always_semi_joinalways_semi_join = hash= hash3.3. bitmap_merge_area_sizebitmap_merge_area_size = 16M (or larger) This is a = 16M (or larger) This is a

per user process (PQ or otherwise) so it must be set per user process (PQ or otherwise) so it must be set with caution. However, a lager size encourages the use with caution. However, a lager size encourages the use of bitmap joins over other types.of bitmap joins over other types.

4.4. create_bitmap_area_sizecreate_bitmap_area_size = 16M (or larger). See notes = 16M (or larger). See notes for (3)for (3)

5.5. hash_area_sizehash_area_size = 16M (or larger). See notes for (3)= 16M (or larger). See notes for (3)6.6. sort_area_sizesort_area_size = 16M (or larger). See notes for (3)= 16M (or larger). See notes for (3)7.7. has_join_enabledhas_join_enabled = true= true8.8. compatible = 8.1.7compatible = 8.1.79.9. optimizer_features_enableoptimizer_features_enable = 8.1.7= 8.1.710.10.star_transformationstar_transformation = true= true

Page 33: Oracle-9i for DataWarehousing

For Oracle 9iFor Oracle 9i1.1. bitmap_merge_area_sizebitmap_merge_area_size = 16M (or larger) This is a = 16M (or larger) This is a

per user process (PQ or otherwise) so it must be set per user process (PQ or otherwise) so it must be set with caution. However, a lager size encourages the use with caution. However, a lager size encourages the use of bitmap joins over other types.of bitmap joins over other types.

2.2. create_bitmap_area_sizecreate_bitmap_area_size = 16M (or larger). See notes = 16M (or larger). See notes for (1)for (1)

3.3. hash_area_sizehash_area_size = 16M (or larger). See notes for (1)= 16M (or larger). See notes for (1)4.4. has_join_enabledhas_join_enabled = true= true5.5. sort_area_sizesort_area_size = 16M (or larger). See notes for (1)= 16M (or larger). See notes for (1)6.6. optimizer_features_enableoptimizer_features_enable = 9.2.0 = 9.2.0 7.7. compatible = 9.2compatible = 9.2 (assuming 9i R2)(assuming 9i R2)8.8. star_transformationstar_transformation = = temp_disabletemp_disable

9i DB 9i DB ConfigConfig for Star for Star TransformationTransformation

Page 34: Oracle-9i for DataWarehousing

Schema Creation GuidelinesSchema Creation Guidelines

Index Creation GuidelinesIndex Creation Guidelines–– Create separate (non concatenated) bitmap Create separate (non concatenated) bitmap

indexes on each of the fact table’s foreign indexes on each of the fact table’s foreign keys.keys.

–– Drop bDrop b--tree PK index on fact for querying, tree PK index on fact for querying, create for loading.create for loading.

–– Create a separate bitmap index on each of the Create a separate bitmap index on each of the nonnon--key columns of the dimension tables. key columns of the dimension tables.

–– Create bCreate b--tree indexes for primary key and tree indexes for primary key and alternate key fields in the dimensions.alternate key fields in the dimensions.

Page 35: Oracle-9i for DataWarehousing

Table Schema GuidelinesTable Schema Guidelines

Table Creation GuidelinesTable Creation Guidelines–– ALTER TABLE ALTER TABLE tab_nametab_name MINIMIZE MINIMIZE

RECORDS_PER_BLOCK; RECORDS_PER_BLOCK; –– AnalyseAnalyse the fact, dimension and the fact, dimension and materialisedmaterialised views. views.

–– Create bitmap indexes on all Create bitmap indexes on all materialisedmaterialised view view columns also.columns also.

–– Enable NOVALIDATE all foreign key constraints in the Enable NOVALIDATE all foreign key constraints in the fact to its dimensions.fact to its dimensions.

Page 36: Oracle-9i for DataWarehousing

The ENDThe END

Q &A