Oracle-9i for DataWarehousing
Transcript of Oracle-9i for DataWarehousing
Features for Data Features for Data WarehousingWarehousing
Presented by Bill BarrowPresented by Bill BarrowOO--DBDB Administration Consulting Ltd.Administration Consulting Ltd.
www.odbac.comwww.odbac.comSeptember 22September 22ndnd 2004 2004 –– ECOUG Breakfast MeetingECOUG Breakfast Meeting
Data Warehousing ChallengesData Warehousing Challenges
Understanding User RequirementsUnderstanding User RequirementsFinding the data Finding the data –––– Internal data sourcesInternal data sources–– External Data sourcesExternal Data sources
Cleaning the DataCleaning the DataLoading and TransformingLoading and TransformingQueryingQueryingManaging the WarehouseManaging the Warehouse
Understanding User RequirementsUnderstanding User RequirementsWhat questions do they want answered from the system?What questions do they want answered from the system?Who is sponsoring the warehouse Who is sponsoring the warehouse –– top management (a top management (a warehouse) or a department head (a data mart)?warehouse) or a department head (a data mart)?Do they know what a data warehouse is?Do they know what a data warehouse is?Do they really need one?Do they really need one?What budget has been allocated or will be allocated once What budget has been allocated or will be allocated once requirements are fully documented?requirements are fully documented?SignSign--off on warehouse design, budget, goals etc.off on warehouse design, budget, goals etc.
•• Flexibility costs $$$. The most flexible grain of storage in a Flexibility costs $$$. The most flexible grain of storage in a sales oriented data warehouse is the individual line item of sales oriented data warehouse is the individual line item of an invoice. an invoice.
•• Once the lowest business grain is built into the warehouse, Once the lowest business grain is built into the warehouse, practically any question can be answered now and in the practically any question can be answered now and in the future. future.
•• This grain costs big $$$ in storage, processing power and This grain costs big $$$ in storage, processing power and system management.system management.
Finding Source DataFinding Source Data
1.1. Internal sources such as sales, Internal sources such as sales, shipping, inventory and customer shipping, inventory and customer management systemsmanagement systems
2.2. External sources such as External sources such as competitive market analyses, competitive market analyses, weather and seasonal information, weather and seasonal information, world events etc.world events etc.
Cleaning DataCleaning DataHow clean are the sources?How clean are the sources?What What format(sformat(s) are the source ) are the source data available in?data available in?−− DB or text? DB or text? −− ASCII or EBCDIC?ASCII or EBCDIC?−− Delimited or fixed length?Delimited or fixed length?What tools (if any) can cross What tools (if any) can cross reference data in multiple reference data in multiple systems? systems? Do you build or buy cleaning Do you build or buy cleaning tools?tools?
Managing the WarehouseManaging the WarehouseData has to be loaded (Data has to be loaded (upsertsupserts or complete or complete refreshes)refreshes)Aggregates have to be build and maintainedAggregates have to be build and maintainedBackups must be performedBackups must be performedIndexes must be built and rebuiltIndexes must be built and rebuiltStatistics have to be gatheredStatistics have to be gatheredLet these things guide your hardware budget. Let these things guide your hardware budget.
−− Refrain from buying the big hardware first (buy Refrain from buying the big hardware first (buy smaller test beds against which you benchmark)smaller test beds against which you benchmark)
−− Tailor your hardware purchases to the needs of your Tailor your hardware purchases to the needs of your batch maintenance windows. batch maintenance windows. ParalleliseParallelise everything to everything to help fit into batch window times. help fit into batch window times.
−− Parallel everything requires Parallel everything requires BIGBIG hardware. Think 8 hardware. Think 8 processor SMP as a small system!processor SMP as a small system!
Part I Part I -- Loading and TransformationLoading and Transformation
Part I of this presentation will focus on Part I of this presentation will focus on External Tables and Table functionsExternal Tables and Table functionsThese features are fully available in 9iR2 These features are fully available in 9iR2 (9.2.0.5 or later with security patches (9.2.0.5 or later with security patches applied)applied)This topic of discussion assumes that :This topic of discussion assumes that :
•• The warehouse design was approved and The warehouse design was approved and implementedimplemented
•• Data sources are available as text files Data sources are available as text files •• Data has already been cleaned (this is not Data has already been cleaned (this is not
part of the ETL process)part of the ETL process)
Loading Data Loading Data –– External TablesExternal Tables
If SQL*Loader is a viable option for If SQL*Loader is a viable option for loading raw, untransformed data into loading raw, untransformed data into the database, them external tables the database, them external tables could be considered as a could be considered as a replacementreplacementThe External table facility The External table facility utilisesutilises a a SQL*Loader data cartridge or access SQL*Loader data cartridge or access driver (as part of the DB) to present driver (as part of the DB) to present external data as an Oracle table.external data as an Oracle table.
Loading Data Loading Data –– External TablesExternal TablesExternal tables do not consume any External tables do not consume any storage within the database (except for storage within the database (except for minimal storage in the dictionary)minimal storage in the dictionary)They do not have any row ids, and cannot They do not have any row ids, and cannot be indexedbe indexedThey can be queried using parallel query They can be queried using parallel query Once created, they behave like normal Once created, they behave like normal tables except that no DML can be tables except that no DML can be performed against them (Read Only).performed against them (Read Only).You can create synonyms, views and You can create synonyms, views and grants etc.grants etc.
Loading Data Loading Data –– External TablesExternal TablesCREATE TABLE INVOICE_DETAIL_ET ( -- The table definition as it is seen in Oracle
TRANSACTION_TYPE CHAR(2)
,ITEM_NO VARCHAR2(9) -- Numeric data converted to VARCHAR2
,QUANTITY NUMBER
,GROSS_PRICE NUMBER
,TOTAL_ALLOWANCE NUMBER
,INVOICE_KEY NUMBER(6,0)) -- Numeric data as a number
ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER -- Other types will be available or you can write your own
DEFAULT DIRECTORY ncs_upload -- Utilises a directory object in Oracle to locate the file.
ACCESS PARAMETERS (RECORDS FIXED 114 -- Two character more than the actual fixed record lengthLOGFILE 'ncs_invoice_detail.log‘ -- The name of the log file (same directory as data file)
DISCARDFILE 'ncs_invoice_detail.dsc‘ -- The name of the discard file
BADFILE 'ncs_invoice_detail.bad‘ -- The name of the bad records file.
FIELDS LTRIM ( -- Tells Oracle to LTRIM all fields
TRANSACTION_TYPE (3:4)
,ITEM_NO (5:13) CHAR RTRIM -- Tells Oracle to RTRIM this field
,QUANTITY (14:20)
,GROSS_PRICE (21:27) -- Could include decimal points and sign
,TOTAL_ALLOWANCE (46:53)
,INVOICE_KEY (102:107) UNSIGNED INTEGER EXTERNAL(6)))
LOCATION ('invoice_detail.tab')) -- The actual name of the data file
REJECT LIMIT UNLIMITED; -- Just keep on loading
Creating for fixed record length data:
Loading Data Loading Data –– External TablesExternal Tables
Sample data for previously defined table:Sample data for previously defined table:
Note that the data ends at 112 characters, but the record Note that the data ends at 112 characters, but the record definition was for 114 characters. This is for the CR/LF which definition was for 114 characters. This is for the CR/LF which must be included in the record length definition. must be included in the record length definition. In UNIX, the record length would have been 113.In UNIX, the record length would have been 113.
1 2 3 4 5 6 7 8 9 10 11123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234IDSA1186 001200000069580 +0000000000000000 +000000000000 |000001<<2>> IDSA1401 005000000018900 +0000000000000000 +000000000000 |000001<<2>>IDSA1448 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA1451 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA1455 001200000069580 +0000000000000000 +000000000000 |000001<<2>>IDSA3528 002400000018920 +0000000000000000 +000000000000 |000001<<2>>IDSA3671 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5324 001200000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5327 002400000036250 +0000000000000000 +000000000000 |000001<<2>>IDSA5328 001200000036250 +0000000000000000 +000000000000 |000001<<2>>
Loading Data Loading Data –– External TablesExternal TablesCreating for delimited record data:
Loading Data : External TablesLoading Data : External Tables
Advantages:Advantages:–– The Load process is “skipped”The Load process is “skipped”–– No need for temporary staging tables No need for temporary staging tables
and their management before and their management before transformation of raw data into transformation of raw data into dimensional structuresdimensional structures
–– Parallel Query (and lack of indexing) is a Parallel Query (and lack of indexing) is a natural fit for large amounts of data that natural fit for large amounts of data that must be sequentially processed during must be sequentially processed during transformation for warehouse refreshestransformation for warehouse refreshes
Transforming Data Transforming Data –– Table FunctionsTable FunctionsTable functions are executed in SELECT Table functions are executed in SELECT statementsstatementsThey return a table of records matching They return a table of records matching the column design of the target Heap the column design of the target Heap tabletableThey They utiliseutilise pipes and parallelismpipes and parallelismThey are written in PL/SQL or any They are written in PL/SQL or any supported DB language (encapsulated)supported DB language (encapsulated)
Calling a table function (Calling a table function (upsertupsert):):
Transforming Data Transforming Data –– Table Table FunctionsFunctions
ALTER SESSION ENABLE PARALLEL DML;MERGE /*+ PARALLEL (FACT,10) APPEND */ -- the insert will be done using 10 PQ processesINTO FACT F -- the query will be done using another 10 PQ procsUSING TABLE(tabfnc.fact_load(CURSOR(SELECT /*+ parallel(ext_tab,10) */ * FROM ext_tab))) eON f.key1 = e.key1
AND f.key2 = e.key2WHEN MATCHED THEN
UPDATE SETf.col1 = e.col1 + f.col1,f.col2 = e.col2 * 1 + (SELECT ... FROM ... WHERE ...)...
WHEN NOT MATCHED THENINSERT VALUES( e.col1,e.col2,e.col3...);
The above statement will spawn 20 PQO processes, more if any sorting or queries occurs within the table function!!! One user, 20+ processes.Or more simply:SELECT * FROM TABLE(tabfnc.fact_load(CURSOR(SELECT * FROM ext_tab WHERE ...)));
The result of the above statement looks like normal SQL*Plus output when querying any other table.
The guts of a table functionThe guts of a table function–– The target table’s record type definitionThe target table’s record type definition–– A table object based on the record typeA table object based on the record type–– A reference cursor returning the record type A reference cursor returning the record type
defined in a package specifically for this defined in a package specifically for this purpose only.purpose only.
–– A function (or packaged function) that is A function (or packaged function) that is designed as a table function and returns the designed as a table function and returns the table object table object –– pipelined and in parallelpipelined and in parallel
–– PL/SQL code that transforms the raw data into PL/SQL code that transforms the raw data into the target table’s data within the function. the target table’s data within the function.
Transforming Data Transforming Data –– Table Table FunctionsFunctions
Transforming Data Transforming Data –– Table FunctionsTable FunctionsCREATE OR REPLACE FUNCTION sales_fnc(ih ref_pkg.invoice_head_rc)
RETURN EST_TABLE -- A table object of type estrecord
PIPELINED
PARALLEL_ENABLE (PARTITION ih BY ANY)
The Record Type CreationThe Record Type CreationCREATE OR REPLACE TYPE CREATE OR REPLACE TYPE ESTRecordESTRecord AS OBJECTAS OBJECT((INVOICE_NO NUMBER(8)INVOICE_NO NUMBER(8),ROUTE_ID NUMBER(3),ROUTE_ID NUMBER(3),OUTLET_ID NUMBER(5),OUTLET_ID NUMBER(5),ITEM_ID VARCHAR2(6),ITEM_ID VARCHAR2(6),UNIT_CODE VARCHAR2(3),UNIT_CODE VARCHAR2(3),UNIT_PRICE NUMBER(13,4),UNIT_PRICE NUMBER(13,4),UNIT_DISCOUNT NUMBER(13,4),UNIT_DISCOUNT NUMBER(13,4),UNIT_QUANTITY NUMBER(4),UNIT_QUANTITY NUMBER(4),TRANSACTION_TYPE VARCHAR2(3),TRANSACTION_TYPE VARCHAR2(3),PAY_METHOD_CODE VARCHAR2(2),PAY_METHOD_CODE VARCHAR2(2),AMOUNT_TENDERED NUMBER(13,4),AMOUNT_TENDERED NUMBER(13,4),INVOICE_TIME VARCHAR2(15),INVOICE_TIME VARCHAR2(15),INVOICE_DATE DATE);,INVOICE_DATE DATE);//CREATE OR REPLACE TYPE EST_TABLE CREATE OR REPLACE TYPE EST_TABLE AS TABLE OF ESTRECORD;AS TABLE OF ESTRECORD;
Reference CursorReference CursorCREATE OR REPLACE PACKAGE REF_PKG ASCREATE OR REPLACE PACKAGE REF_PKG AS
TYPE INVOICE_HEAD_RC TYPE INVOICE_HEAD_RC
IS REF CURSOR IS REF CURSOR
RETURN INVOICE_HEAD_ET%ROWTYPE;RETURN INVOICE_HEAD_ET%ROWTYPE;
END;END;
//
Load and Transform Load and Transform -- ConclusionConclusion
The combination of External Tables and Table The combination of External Tables and Table Functions provides a powerful platform for Functions provides a powerful platform for loading large amount of data into the Oracle loading large amount of data into the Oracle databasedatabaseParallel everything makes it possible to achieve Parallel everything makes it possible to achieve parallelism without complex UNIX scriptingparallelism without complex UNIX scriptingThis is a portable solution (across any Oracle This is a portable solution (across any Oracle supported platform).supported platform).The lack of native, flexible scripting shells on The lack of native, flexible scripting shells on some some OS’sOS’s can be circumvented by these 9i can be circumvented by these 9i facilities.facilities.Parallel processing on right sized machines will Parallel processing on right sized machines will help ensure the batch window is met and the help ensure the batch window is met and the warehouse usable.warehouse usable.
Part II Part II -- Querying the WarehouseQuerying the Warehouse
This discussion assumes that a STAR This discussion assumes that a STAR SCHEMA has been implemented as SCHEMA has been implemented as the warehouse schemathe warehouse schemaOracle 9i’s star transformation Oracle 9i’s star transformation optimisationoptimisation is superior to all that is superior to all that came before.came before.To properly To properly utiliseutilise and benefit from and benefit from Star Transformation query plans, Star Transformation query plans, precise DB setup is a must. precise DB setup is a must.
A Star SchemaA Star Schema
Sales ValueGross ProfitContributionTotal CostRaw Materials CostDates Key (FK) (IE)Route Key (FK) (IE)Product Key (FK) (IE)Customer Key (FK) (IE)Boxes Sold
Product Sales (Fact)
Dates KeyYearQtrMonthDayDate
Dates Dimension
Route KeyRegionCountryTerritoryRouteSub Area
Route Dimension
Product KeyGroupClassSub ClassBrandNameWeightManufactured By
Product Dimension
Customer KeyClassSub ClassTypeSub typeChainNameAddress
Customer Dimension
A Typical Warehouse QueryA Typical Warehouse Query
Query Plans Query Plans –– Non StarNon StarA nonA non--star transformation database star transformation database configuration will result in the following plan:configuration will result in the following plan:
Query Plan Query Plan –– Star TransformationStar Transformation
Another Typical Warehouse QueryAnother Typical Warehouse Query
The Non Star Transformation The Non Star Transformation Query PlanQuery Plan
The Star Transformation Query The Star Transformation Query PlanPlan
Characteristics of the Star Characteristics of the Star Transformation PlanTransformation Plan
Star Transformations Further Star Transformations Further OptimisedOptimised with Parallelismwith Parallelism
Parallel Enabled PlanParallel Enabled PlanThe join order was changed when parallel degree on the The join order was changed when parallel degree on the fact was set to 2. fact was set to 2. (SELECT (SELECT /*+ parallel (product_sales,2)*//*+ parallel (product_sales,2)*/ YEAR, …)YEAR, …)
–– Time to return 5 years of data (2000 Time to return 5 years of data (2000 –– 2004, 5.7M rows 2004, 5.7M rows in the fact) is 27 seconds. in the fact) is 27 seconds.
–– Execution Cost is 14,425 Execution Cost is 14,425 When Set to 4:When Set to 4:–– Time to return 5 years of data (2000 Time to return 5 years of data (2000 –– 2004, 5.7M rows 2004, 5.7M rows
in the fact) is 24 in the fact) is 24 secssecs..–– Execution Cost is 7,234Execution Cost is 7,234
Increasing parallelism beyond 4 (to 8 and 16) met with no Increasing parallelism beyond 4 (to 8 and 16) met with no further performance gains.further performance gains.Indeed, Oracle’s automatic parallel query tuning Indeed, Oracle’s automatic parallel query tuning parameters allowed only up to 8 parallel query processes to parameters allowed only up to 8 parallel query processes to be created, ignoring any further request for increased be created, ignoring any further request for increased parallelism. parallelism.
8i DB 8i DB ConfigConfig for Star for Star TransformationTransformation
For Oracle 8iFor Oracle 8i1.1. always_anti_joinalways_anti_join = hash= hash2.2. always_semi_joinalways_semi_join = hash= hash3.3. bitmap_merge_area_sizebitmap_merge_area_size = 16M (or larger) This is a = 16M (or larger) This is a
per user process (PQ or otherwise) so it must be set per user process (PQ or otherwise) so it must be set with caution. However, a lager size encourages the use with caution. However, a lager size encourages the use of bitmap joins over other types.of bitmap joins over other types.
4.4. create_bitmap_area_sizecreate_bitmap_area_size = 16M (or larger). See notes = 16M (or larger). See notes for (3)for (3)
5.5. hash_area_sizehash_area_size = 16M (or larger). See notes for (3)= 16M (or larger). See notes for (3)6.6. sort_area_sizesort_area_size = 16M (or larger). See notes for (3)= 16M (or larger). See notes for (3)7.7. has_join_enabledhas_join_enabled = true= true8.8. compatible = 8.1.7compatible = 8.1.79.9. optimizer_features_enableoptimizer_features_enable = 8.1.7= 8.1.710.10.star_transformationstar_transformation = true= true
For Oracle 9iFor Oracle 9i1.1. bitmap_merge_area_sizebitmap_merge_area_size = 16M (or larger) This is a = 16M (or larger) This is a
per user process (PQ or otherwise) so it must be set per user process (PQ or otherwise) so it must be set with caution. However, a lager size encourages the use with caution. However, a lager size encourages the use of bitmap joins over other types.of bitmap joins over other types.
2.2. create_bitmap_area_sizecreate_bitmap_area_size = 16M (or larger). See notes = 16M (or larger). See notes for (1)for (1)
3.3. hash_area_sizehash_area_size = 16M (or larger). See notes for (1)= 16M (or larger). See notes for (1)4.4. has_join_enabledhas_join_enabled = true= true5.5. sort_area_sizesort_area_size = 16M (or larger). See notes for (1)= 16M (or larger). See notes for (1)6.6. optimizer_features_enableoptimizer_features_enable = 9.2.0 = 9.2.0 7.7. compatible = 9.2compatible = 9.2 (assuming 9i R2)(assuming 9i R2)8.8. star_transformationstar_transformation = = temp_disabletemp_disable
9i DB 9i DB ConfigConfig for Star for Star TransformationTransformation
Schema Creation GuidelinesSchema Creation Guidelines
Index Creation GuidelinesIndex Creation Guidelines–– Create separate (non concatenated) bitmap Create separate (non concatenated) bitmap
indexes on each of the fact table’s foreign indexes on each of the fact table’s foreign keys.keys.
–– Drop bDrop b--tree PK index on fact for querying, tree PK index on fact for querying, create for loading.create for loading.
–– Create a separate bitmap index on each of the Create a separate bitmap index on each of the nonnon--key columns of the dimension tables. key columns of the dimension tables.
–– Create bCreate b--tree indexes for primary key and tree indexes for primary key and alternate key fields in the dimensions.alternate key fields in the dimensions.
Table Schema GuidelinesTable Schema Guidelines
Table Creation GuidelinesTable Creation Guidelines–– ALTER TABLE ALTER TABLE tab_nametab_name MINIMIZE MINIMIZE
RECORDS_PER_BLOCK; RECORDS_PER_BLOCK; –– AnalyseAnalyse the fact, dimension and the fact, dimension and materialisedmaterialised views. views.
–– Create bitmap indexes on all Create bitmap indexes on all materialisedmaterialised view view columns also.columns also.
–– Enable NOVALIDATE all foreign key constraints in the Enable NOVALIDATE all foreign key constraints in the fact to its dimensions.fact to its dimensions.
The ENDThe END
Q &A