BI Apps796 Perf Tech NoteV7

download BI Apps796 Perf Tech NoteV7

of 71

description

BI Apps796 Perf Tech NoteV7

Transcript of BI Apps796 Perf Tech NoteV7

  • 1

    Oracle Business

    Intelligence Applications

    Version 7.9.6.x Performance

    Recommendations An Oracle Technical Note, 7

    th Edition

    April 2011

    Copyright 2011, Oracle. All rights reserved.

  • 2

    Oracle Business Intelligence Applications Version 7.9.6.x

    Performance Recommendations

    Introduction ............................................................................................................................................................. 4 Hardware recommendations for implementing Oracle BI Applications .................................................................. 4

    Storage Considerations for Oracle Business Analytics Warehouse ................................................................... 5 Introduction ..................................................................................................................................................... 5 Shared Storage Impact Benchmarks ............................................................................................................. 5 Conclusion ...................................................................................................................................................... 7

    Source Tier ......................................................................................................................................................... 7 Oracle BI Enterprise Edition (OBIEE) / ETL Tier ................................................................................................ 7

    Review of OBIEE/ETL Tier components ........................................................................................................ 7 Deployment considerations for the ETL components .................................................................................... 7

    Target Tier .......................................................................................................................................................... 7 Oracle RDBMS ............................................................................................................................................... 7

    Oracle Business Analytics Warehouse configuration ............................................................................................. 8 Database configuration parameters ................................................................................................................... 8 ETL impact on amount of generated REDO Logs .............................................................................................. 8 Oracle RDBMS System Statistics ....................................................................................................................... 9 Parallel Query configuration ............................................................................................................................... 9 Oracle Business Analytics Warehouse Tablespaces ....................................................................................... 10

    Oracle BI Applications Best Practices for Oracle Exadata ................................................................................... 10 Handling BI Applications Indexes in Exadata Warehouse Environment .......................................................... 10 Gather Table Statistics for BI Applications Tables ........................................................................................... 11 Oracle Business Analytics Warehouse Storage Settings in Exadata ............................................................... 11 Parallel Query Use in BI Applications on Exadata ............................................................................................ 11 Compression Implementation Oracle Business Analytics Warehouse in Exadata .......................................... 12 Exadata Smart Flash Cache ............................................................................................................................. 12 Database Parameter File for Analytics Warehouse on Exadata ...................................................................... 12

    Informatica configuration for better performance .................................................................................................. 14 Informatica PowerCenter 8.6 32-bit vs. 64-bit .................................................................................................. 14 Informatica Session Logs ................................................................................................................................. 14 Informatica Lookups ......................................................................................................................................... 15 Disabling Lookup Cache for very large Lookups .............................................................................................. 15 Joining Staging Tables to Lookup Tables in Informatica Lookups ................................................................... 16 Informatica Custom Relational Connections for long running mappings.......................................................... 16 Informatica Session Parameters ...................................................................................................................... 17

    Commit Interval ............................................................................................................................................ 17 DTM Buffer Size ........................................................................................................................................... 17 Additional Concurrent Pipelines for Lookup Cache Creation ....................................................................... 18 Default Buffer Block Size ............................................................................................................................. 18

    Informatica Load: Bulk vs. Normal ................................................................................................................... 18 Informatica Bulk Load: Table Fragmentation ................................................................................................... 18 Use of NULL Ports in Informatica Mappings .................................................................................................... 19 Informatica Parallel Sessions Load on ETL tier................................................................................................ 19 Informatica Load Balancing Implementation .................................................................................................... 20

    Bitmap Indexes usage for better queries performance ......................................................................................... 20 Introduction ....................................................................................................................................................... 20 DAC properties for handling bitmap indexes during ETL ................................................................................. 20 Bitmap Indexes handling strategies .................................................................................................................. 22 Disabling Indexes with DISTINCT_KEYS = 0 or 1 ........................................................................................... 25

  • 3

    Monitoring and Disabling Unused Indexes ....................................................................................................... 26 Handling Query Indexes during Initial ETL ....................................................................................................... 28

    Partitioning guidelines for Large Fact tables ......................................................................................................... 29 Introduction ....................................................................................................................................................... 29 Convert to partitioned tables ............................................................................................................................. 30

    Identify a partitioning key and decide on a partitioning interval .................................................................... 30 Create a partitioned table in Data Warehouse ............................................................................................. 31 Configure Informatica to support partitioned tables ..................................................................................... 34 Configure DAC to support partitioned tables ................................................................................................ 34 Unit test the changes for converted partitioned tables in DAC ..................................................................... 41

    Interval Partitioning ........................................................................................................................................... 41 Informatica Workflows Session partitioning .......................................................................................................... 42

    Workflow Session Partitioning for Parallel Writer Updates .............................................................................. 42 Table Compression implementation guidelines .................................................................................................... 44 Guidelines for Oracle optimizer hints usage in ETL mappings ............................................................................. 45

    Hash Joins versus Nested Loops in Oracle RDBMS........................................................................................ 45 Suggested hints for Oracle Business Intelligence Applications 7.9.6 ............................................................... 48 Using Oracle Optimizer Dynamic Sampling for big staging tables ................................................................... 52

    Custom Indexes in Oracle EBS for incremental loads performance .................................................................... 53 Introduction ....................................................................................................................................................... 53 Custom OBIEE indexes in EBS 11i and R12 systems ..................................................................................... 53 Custom EBS indexes in EBS 11i source systems ............................................................................................ 55 Oracle EBS tables with high transactional load ................................................................................................ 56 Custom EBS indexes on CREATION_DATE in EBS 11i source systems ....................................................... 57

    Custom Aggregates for Better Query Performance .............................................................................................. 57 Introduction ....................................................................................................................................................... 57 Database Configuration Requirements for using MVs ..................................................................................... 57 Custom Materialized View Guidelines .............................................................................................................. 58 Integrate MV Refresh in DAC Execution Plan .................................................................................................. 62

    Wide tables with over 255 columns performance ................................................................................................. 64 Introduction ....................................................................................................................................................... 64 Wide tables structure optimization ................................................................................................................... 64

    Oracle BI Applications HIgh Availability ................................................................................................................ 65 Introduction ....................................................................................................................................................... 65 High Availability with Oracle Data Guard and Physical Standby Database ...................................................... 65

    Oracle BI Applications ETL Performance Benchmarks ........................................................................................ 67 Oracle BI Applications 7.9.6.1, Siebel CRM 8.0 Adapter.................................................................................. 67 Oracle BI Applications 7.9.6.1, Oracle EBS R12 Projects Adapter .................................................................. 68 Oracle BI Applications 7.9.6.1, Oracle EBS 11i10 Enterprise Sales Adapter ................................................... 68 Oracle BI Applications 7.9.6.1, Oracle EBS 11i10 Supply Chain Adapter ........................................................ 69

    Conclusion ............................................................................................................................................................ 70

  • 4

    Oracle Business Intelligence Applications Version 7.9.6.x Performance

    Recommendations

    INTRODUCTION

    Oracle Business Intelligence (BI) Applications Version 7.9.6 delivers a number of adapters to various business

    applications on Oracle database. 7.9.6.1 version is certified with other major data warehousing platforms. Each Oracle

    BI Applications implementation requires very careful planning to ensure the best performance both during ETL and web

    queries or dashboard execution.

    This article discusses performance topics for Oracle BI Applications 7.9.6 and higher using Informatica PowerCenter

    8.6 ETL platform.

    Note: The document is intended for experienced Oracle BI Administrators, DBAs and Applications implementers. It

    covers advanced performance tuning techniques in Informatica and Oracle RDBMS, so all recommendations must be

    carefully verified in a test environment before applied to a production instance. Customers are encouraged to engage

    Oracle Expert Services to review their configurations prior to implementing the recommendations to their BI

    Applications environments.

    HARDWARE RECOMMENDATIONS FOR IMPLEMENTING ORACLE BI APPLICATIONS

    Depending on source data volumes, Oracle BI Applications Version 7.9.6 implementations can be categorized as

    small, medium and large. The table below summarizes hardware recommendations for Oracle BI Applications tiers by

    the volume ranges.

    Source Data

    Volume

    SMALL:

    Up to 200Gb

    MEDIUM:

    200Gb to 1Tb

    LARGE:

    1Tb and higher

    Target Tier

    # CPU cores 8 16 32*

    Physical RAM 16Gb 32Gb 64Gb*

    Storage Space Up to 400Gb 400Gb - 2Tb 2Tb and higher

    Storage System Local (PATA, SATA, iSCSI),

    preferred RAID configuration

    Local (PATA, SATA, iSCSI).

    Recommended two or more

    I/O controllers

    High performance SCSI or

    network attached storage.

    Hardware RAID controller with

    multiple I/O channels.

    Oracle BI Enterprise Edition / ETL Tier

    # CPU cores 4 - 8 8 - 16 16**

    Physical RAM 8Gb 8 - 16Gb 16Gb**

  • 5

    Storage Space 100Gb local 200Gb local 400Gb local

    * Consider implementing Oracle RAC with multiple nodes to accommodate large numbers of concurrent users

    accessing web reports and dashboards.

    ** Consider installing two or more servers on ETL tier and implementing Informatica Load Balancing across all ETL tier

    servers.

    Important: It is recommended to set up all Oracle BI Applications tiers in the same local area network.

    Installation of any of these three tiers over Wide Area Network (WAN) may cause timeouts during ETL

    mappings execution on the ETL tier.

    Storage Considerations for Oracle Business Analytics Warehouse

    Introduction

    Oracle BI Applications ETL execution plans are optimized to maximize hardware utilization on ETL and target tiers and

    reduce ETL runtime. Usually a well-optimized infrastructure consumes higher CPU and memory on an ETL tier and

    causes rather heavy storage I/O load on a target tier during an ETL execution. The storage could easily become a

    major bottleneck as the result of such actions as:

    Setting excessive parallel query processes (refer to Parallel Query Configuration section for more details)

    Running multiple I/O intensive applications, such as databases, on a shared storage

    Choosing sub-optimal storage for running BI Applications tiers.

    Oracle positions Exadata solution as fast and efficient hardware for addressing I/O bottlenecks in large volume

    environments. The internal benchmarks for running Oracle BI Applications on Exadata will be published soon.

    Shared Storage Impact Benchmarks

    Sharing storage among heavy I/O processes could easily degrade ETL performance and result in extended ETL

    runtime. The following benchmarks helped to measure the impact from sharing the same NetApp filer storage between

    two target databases, concurrently loading data in two parallel ETL executions.

    Configuration description:

    Linux servers #1 and #2 have the following configurations:

    2 quad-core 1.8 GHz Intel Xeon CPU

    32 GB RAM

    Shared NetApp filer volumes, volume1 and volume2, are mounted as EXT3 file systems:

    o Server #1 uses volume1

    o Server #2 uses volume2

    Execution test description:

    Set record block size for I/O operations to 32k, the recommended db block size in a target database.

    Execute parallel load using eight child processes to imitate average workload during ETL run.

    Run the following test scenarios:

    o Test#1: execute parallel load above on NFS volume1 using Linux server #1; keep Linux server #2 idle.

    o Test#2: execute parallel load above on both NFS volume1 and volume2 using Linux servers #1 and #2.

  • 6

    The following benchmarks describe performance measurements in KB / sec:

    - Initial Write: write a new file.

    - Rewrite: re-write in an existing file.

    - Read: read an existing file.

    - Re-Read: re-read an existing file.

    - Random Read: read a file with accesses made to random locations in the file.

    - Random Write: write a file with accesses made to random locations in the file.

    - Mixed workload: read and write a file with accesses made to random locations in the file.

    - Reverse Read: read a file backwards.

    - Record Rewrite: write and re-write the same record in a file.

    - Strided Read: read a file with a strided access behavior, for example: read at offset zero for a length of 4

    Kbytes, seek 200 Kbytes, read for a length of 4 Kbytes, seek 200 Kbytes and so on.

    The test summary:

    Test Type Test #1 Test #2

    "Initial write " 46087.10 KB/sec 30039.90 KB/sec

    "Rewrite " 70104.05 KB/sec 30106.25 KB/sec

    "Read " 3134220.53 KB/sec 2078320.83 KB/sec

    "Re-read " 3223637.78 KB/sec 3038416.45 KB/sec

    "Reverse Read " 1754192.17 KB/sec 1765427.92 KB/sec

    "Stride read " 1783300.46 KB/sec 1795288.49 KB/sec

    "Random read " 1724525.63 KB/sec 1755344.27 KB/sec

    "Mixed workload " 2704878.70 KB/sec 2456869.82 KB/sec

    "Random write " 68053.60 KB/sec 25367.06 KB/sec

    "Pwrite " 45778.21 KB/sec 23794.34 KB/sec

    "Pread " 2837808.30 KB/sec 2578445.19 KB/sec

    Total Time 110 min 216 min

    Initial Write, Rewrite, Initial Read, Random Write, and Pwrite (buffered write operation) were impacted the most, while

    Reverse Read, Stride Read, Random Read, Mixed Workload and Pread (buffered read operation) were impacted the

    least by the concurrent load.

    Read operations do not require specific RAID sync-up operations therefore read requests are less dependent on the

    number of concurrent threads.

  • 7

    Conclusion

    Make sure you carefully plan for storage deployment, configuration and usage in Oracle BI Applications environment.

    Avoid sharing the same RAID controller(s) across multiple databases. Set up periodic monitoring of your I/O system

    during both ETL and end user queries load for any potential bottlenecks.

    Source Tier

    Oracle BI Applications data loads may cause additional overhead of up to fifteen percent of CPU and memory on a

    source tier. There might be a bigger impact on the I/O subsystem, especially during full ETL loads. Using several I/O

    controllers or a hardware RAID controller with multiple I/O channels on the source side would help to minimize the

    impact on Business Applications during ETL runs and speed up data extraction into a target data warehouse.

    Oracle BI Enterprise Edition (OBIEE) / ETL Tier

    Review of OBIEE/ETL Tier components

    The Oracle BIEE/ETL Tier is composed of the following parts:

    - Oracle Business Intelligence Server 10.1.3.4

    - Informatica PowerCenter 8.6 Client

    - Informatica PowerCenter 8.6 Server

    - Data Warehouse Administration Console (DAC) client 10.1.3.4.1

    - Data Warehouse Administration Console server 10.1.3.4.1

    - Informatica BI Applications Repository (usually stored in a target database)

    - DAC BI Applications Repository (usually stored in a target database)

    Deployment considerations for the ETL components

    The Informatica server and DAC server should be installed on a dedicated machine for best performance.

    The Informatica server and DAC server cannot be installed separately on different servers.

    The Informatica client and DAC client can be located on an ETL Administration client machine, or a Windows

    server, running Informatica and DAC servers.

    Informatica and DAC repositories can be deployed as separate schemas in the same database, as Oracle

    Business Analytics Warehouse, if the target database platform is Oracle, IBM DB2 or Microsoft SQL Server.

    The Informatica server and DAC server host machine should be physically located near the source data machine

    to improve network performance.

    Target Tier

    Oracle RDBMS

    Oracle recommends deploying Oracle Business Analytics Warehouse on Oracle RDBMS 64-bit, running under 64-bit

    Operating System (OS). If 64-bit OS is not available, then consider implementing Very Large Memory (VLM) on Unix /

    Linux and Address Windowing Extensions (AWE) for Windows 32 bit Platforms. VLM/AWE implementations would

    increase database address space to allow for more database buffers or a larger indirect data buffer window. Refer to

    Oracle Metalink for VLM / AWE implementation for your platform.

  • 8

    Note: You cannot use sga_target or db_cache_size parameters if you enable VLM / AWE by setting

    'use_indirect_data_buffers = true'. You would have to manually resize all SGA memory components and use

    db_block_buffers instead of db_cache_size to specify your data cache.

    ORACLE BUSINESS ANALYTICS WAREHOUSE CONFIGURATION

    Database configuration parameters

    Oracle Business Intelligence Applications version 7.9.6 is certified with Oracle RDBMS 10g and 11g. Since Oracle BI

    Applications extensively use bitmap indexes, partitioned tables, and other database features in both ETL and front-end

    queries logic, it is important that Oracle BI Applications customers install the latest database releases for their Data

    Warehouse tiers:

    - Oracle 10g customers should use Oracle 10.2.0.4 or higher.

    - Oracle 11g customers should use Oracle 11.1.0.7 or higher.

    Important: Oracle 10.2.0.1 customers must upgrade their Oracle Business Analytics Warehouses to the latest

    Patchset.

    Oracle BI Applications include template init.ora files with recommended and required parameters located in the

    \dwrep\Documentation\ directory:

    - init10gR2.ora - init.ora template for Oracle RDBMS 10g

    - init11g.ora init.ora template for Oracle RDBMS 11g

    - init11gR2.ora init.ora template for Oracle RDBMS 11gR2

    Review an appropriate init.ora template file and follow its guidelines to configure target database parameters specific to

    your data warehouse tier hardware.

    Note: init.ora template for Exadata / 11gR2 is provided in Exadata section of this document.

    ETL impact on amount of generated REDO Logs

    Initial ETL may cause higher than usual generation of REDO logs, when loading large data volumes in a data

    warehouse database. If your target database is configured to run in ARCHIVELOG mode, you can consider two

    options:

    1. Switch the database to NOARCHIVELOG mode, execute Initial ETL, take a cold backup and switch the

    database back to ARCHIVELOG mode.

    2. Allocate up to 10-15% of additional space to accommodate for archived REDO logs during Initial ETL.

    Below is a calculation of generated REDO amount in an internal initial ETL run:

    redo log file sequence:

    start : 641 (11 Jan 21:10)

    end : 1624 (12 Jan 10:03)

    total # of redo logs : 983

    log file size : 52428800

    redo generated: 983*52428800 = 51537510400 (48 GB)

    Data Loaded in warehouse:

    SQL> select sum(bytes)/1024/1024/1024 Gb from dba_segments where owner='DWH' and

    segment_type='TABLE';

  • 9

    Gb

    ----------

    280.49

    Oracle RDBMS System Statistics

    Oracle has introduced workload statistics in Oracle 9i to gather important information about system such as single and

    multiple block read time, CPU speed, and various system throughputs. Optimizer takes system statistics into account,

    when it computes the cost of query execution plans. Failure to gather workload statistics may result in sub-optimal

    execution plans for queries, excessive temporary space consumption, and ultimately impact BI Applications

    performance.

    Oracle BI Applications customers are required to gather workload statistics on both source and target Oracle

    databases prior to running initial ETL.

    Oracle recommends two options to gather system statistics:

    - Run the dbms_stats.gather_system_stats('start') procedure at the beginning of the workload window, then the

    dbms_stats.gather_system_stats('stop') procedure at the end of the workload window.

    - Run dbms_stats.gather_system_stats('interval', interval=>N) where N is the number of minutes when statistics

    gathering will be stopped automatically.

    Important: Execute dbms_stats.gather_system_stats, when the database is not idle. Oracle computes desired system

    statistics when database is under significant workload. Usually half an hour is sufficient to generate the valid statistic

    values.

    Parallel Query configuration

    The Data Warehouse Administration Console (DAC) leverages the Oracle Parallel Query option for computing statistics

    and building indexes on target tables. By default DAC creates indexes with the 'PARALLEL' clause and computes

    statistics with pre-calculated degree of parallelism. Refer to the init.ora template files, located in

    \dwrep\Documentation for details on setting the following parameters:

    parallel_max_servers

    parallel_min_servers

    parallel_threads_per_cpu

    Important: Parallel execution is non-scalable. It could easily lead to increased resource contention, creating

    I/O bottlenecks, and increasing response time when the resources are shared by many concurrent

    transactions.

    Since DAC creates indexes and computes statistics on target tables in parallel on a single table and across multiple

    tables, the parallel execution may cause performance problems if the values parallel_max_servers and

    parallel_threads_per_cpu are too high. The system load from parallel operations can be observed by executing the

    following query:

    SQL> select name, value from v$sysstat where name like 'Parallel%';

    Reduce the "parallel_threads_per_cpu" and "parallel_max_servers" value if the system is overloaded.

  • 10

    Oracle Business Analytics Warehouse Tablespaces

    By default, DAC deploys all data warehouse entities into two tablespaces: all tables into a DATA tablespace, and all

    indexes into an INDEX tablespace. Depending on your hardware configuration on the target tier you can improve its

    performance by rearranging your data warehouse tablespaces.

    The following table summarizes space allocation estimates in a data warehouse by its data volume range:

    Target Data Volume SMALL:

    Up to 400Gb

    MEDIUM:

    400Gb to 2Tb

    LARGE:

    2Tb and higher

    Temporary Tablespace 40 60Gb 60 150Gb 150 250Gb

    DATA Tablespace 350Gb 350Gb 1.8Tb > 1.8Tb

    INDEX Tablespace 50Gb 50 200Gb > 200Gb

    Important!!! Make sure you use Locally Managed tablespaces with AUTOALLOCATE clause. DO NOT use UNIFORM

    extents size, as it may cause excessive space consumption and result in queries slower performance.

    Use standard (primary) block size for your warehouse tablespaces. DO NOT build your warehouse on

    non-standard block tablespaces.

    Note that the INDEX Tablespace may increase if you enable more query indexes in your data warehouse.

    During incremental loads, by default DAC drops and rebuilds indexes, so you should separate all indexes in a

    dedicated tablespace and, if you have multiple RAID / IO Controllers, move the INDEX tablespace to a separate

    controller.

    You may also consider isolating staging tables (_FS) and target fact tables (_F) on different controllers. Such

    configuration would help to speed up Target Load (SIL) mappings for fact tables by balancing I/O load on multiple RAID

    controllers.

    ORACLE BI APPLICATIONS BEST PRACTICES FOR ORACLE EXADATA

    Handling BI Applications Indexes in Exadata Warehouse Environment

    Oracle Business Analytic Applications Suite uses two types of indexes:

    ETL indexes for optimizing ETL performance and ensuring data integrity

    Query indexes, mostly bitmaps, for end user star queries

    Exadata Storage Indexes functionality cannot be considered as unconditional replacement for BI Apps indexes. You

    can employ storage indexes only in those cases when BI Applications query indexes deliver inferior performance and

    you ran the comprehensive tests to ensure no regressions for all other queries without the query indexes.

    Do not drop any ETL indexes, as you may not only impact your ETL performance but also compromise data integrity in

    your warehouse.

    The best practices for handling BI Applications indexes in Exadata Warehouse:

    Turn on Index usage monitoring to identify any unused indexes and drop / disable them in your env. Refer to

    the corresponding section in the document for more details.

    Consider pinning the critical target tables in smart flash cache

  • 11

    Consider building custom aggregates to pre-aggregate more data and simplify queries performance.

    Drop selected query indexes and disable them in DAC to use Exadata Storage Indexes / Full Table Scans only

    after running comprehensive benchmarks and ensuring no impact on any other queries performance.

    Gather Table Statistics for BI Applications Tables

    Out of the box Data Warehouse Admin Console (DAC) uses FOR INDEXED COLUMNS syntax for computing BI

    Applications table statistics. It does not cover statistics for non-indexed columns participating in end user query joins. If

    you choose to drop some indexes in Exadata environment, then there would be more critical columns with NULL

    statistics. As the result, Optimizer may choose sub-optimal execution plan and result in slower performance.

    You should consider switching to FOR ALL COLUMNS SIZE AUTO syntax in

    DBMS_STATS.GATHER_TABLE_STATS call in DAC:

    1. Navigate to your /CustomSQLs and open customsql.xml file for editing.

    2. Replace FOR INDEXED COLUMNS with FOR ALL COLUMNS SIZE AUTO in

    DBMS_STATS.GATHER_TABLE_STATS call in section.

    3. Save the changes.

    Next time you run an ETL, DAC will compute the statistics for BI Applications tables for all columns.

    Oracle Business Analytics Warehouse Storage Settings in Exadata

    The recommended database block size (db_block_size parameter) is 8K. You may consider using 16K block

    size as well, primarily to increase for better compression rate, as Oracle applies compression at block level.

    Refer to init.ora template in the section below.

    Make sure you use locally managed tablespaces with AUTOALLOCATE option. DO NOT use UNIFORM

    extent size for your warehouse tablespaces.

    Use your primary database block size 8k (or 16k) for your warehouse tablespaces. It is NOT recommended to

    use non-standard block size tablespaces for deploying production warehouse.

    Use 8Mb large extent size for partitioned fact tables and non-partitioned large segments, such as dimensions,

    hierarchies, etc. Setting cell_partition_large_extents = TRUE will ensure all partitioned tables get created with

    INITIAL extent size of 8Mb. You will have to manually specify INITIAL and NEXT extents size of 8Mb for non-

    partitioned segments.

    Set deferred_segment_creation = TRUE to defer a segment creation until the first record is inserted. Refer to

    init.ora section below.

    Parallel Query Use in BI Applications on Exadata

    All BI Applications tables are created without any degree of parallelism in BI Applications schema. Since DAC manages

    parallel jobs, such as Informatica mappings or indexes creation, during an ETL, the use of Parallel Query in ETL

    mappings could generate more I/O overhead and cause performance regressions for ETL jobs.

    Exadata hardware provides much better scalability for I/O resources, so you can consider turning on Parallel Query for

    slow queries by setting PARALLEL attribute for large tables participating in the queries. For example:

    SQL> ALTER TABLE W_GL_BALANCE_F PARALLEL;

  • 12

    Make sure you benchmark the query performance prior to implementing the changes in your Production environment.

    Compression Implementation Oracle Business Analytics Warehouse in Exadata

    Table compression can significantly reduce a segment size, and improve queries performance in Exadata environment.

    However, depending on the nature DML operations in ETL mappings, it may result in their slower mapping

    performance and larger consumed space. The following guidelines will help to ensure successful compression

    implementation in your Exadata environment:

    Consider implementing compression after running an Initial ETL. The initial ETL plan contains several mappings

    with heavy updates, which could impact your ETL performance.

    Implement large facts table partitioning and compress inactive historic partitions only. Make sure that the active

    ones remain uncompressed.

    Choose either Basic or Advanced compression types for your compression candidates.

    Review periodically the allocated space for a compressed segment, and check such stats as num_rows, blocks

    and avg_row_len in user_tables view. For example, the following compressed segment needs to be re-

    compressed, as it consumes too many blocks:

    Num_rows Avg_row_len Blocks Compression

    541823382 181 13837818 ENABLED

    The simple calculation (num_rows * avg_row_len / 8k block size) * ~25% (block overhead) gives ~15M blocks for an

    uncompressed segment. This segment should be re-compressed reduce its footprint and improve its queries

    performance.

    Refer to Table Compression Implementation Guidelines section in this document for additional information on

    compression for BI Applications Warehouse.

    Exadata Smart Flash Cache

    The use of Smart Flash Cache in Oracle Business Analytics Warehouse can significantly improve end user queries

    performance. You can consider pinning most frequently used dimensions which impact your queries performance. To

    manually pin a table in Exadata Smart Flash Cache, use the following syntax:

    ALTER TABLE W_PARTY_D STORAGE (CELL_FLASH_CACHE KEEP);

    The Exadata Storage Server will cache data for W_PARTY_D table more aggressively and will try to keep the data

    from this table longer than cached data from other tables.

    Important!!! Use manual Flash Cache pinning only for the most common critical tables.

    Database Parameter File for Analytics Warehouse on Exadata

    Use the template file below for your init.ora parameter file for Business Analytics Warehouse on Oracle Exadata.

    ###########################################################################

    # Oracle BI Applications - init.ora template

    # This file contains a listing of init.ora parameters for 11.2 / Exadata

    ###########################################################################

    db_name =

    control_files = //ctrl01.dbf, //ctrl02.dbf

  • 13

    db_block_size = 8192 # or 16384 (for better compression)

    db_block_checking = FALSE

    db_block_checksum = TYPICAL

    cell_partition_large_extents = TRUE

    deferred_segment_creation = TRUE

    user_dump_dest = //admin//udump

    background_dump_dest = //admin//bdump

    core_dump_dest = //admin//cdump

    max_dump_file_size = 20480

    processes = 1000

    sessions = 2000

    db_files = 1024

    session_max_open_files = 100

    dml_locks = 1000

    cursor_sharing = EXACT

    cursor_space_for_time = FALSE

    session_cached_cursors = 500

    open_cursors = 1000

    db_writer_processes = 2

    aq_tm_processes = 1

    job_queue_processes = 2

    timed_statistics = true

    statistics_level = typical

    sga_max_size = 45G

    sga_target = 40G

    shared_pool_size = 2G

    shared_pool_reserved_size = 100M

    workarea_size_policy = AUTO

    pre_page_sga = FALSE

    pga_aggregate_target = 16G

    log_checkpoint_timeout = 3600

    log_checkpoints_to_alert = TRUE

    log_buffer = 10485760

    undo_management = AUTO

    undo_tablespace = UNDOTS1

    undo_retention = 90000

    parallel_adaptive_multi_user = FALSE

    parallel_max_servers = 128

    parallel_min_servers = 32

    # ------------------- MANDATORY OPTIMIZER PARAMETERS ----------------------

    star_transformation_enabled = TRUE

    query_rewrite_enabled = TRUE

  • 14

    query_rewrite_integrity = TRUSTED

    _b_tree_bitmap_plans = FALSE

    _optimizer_autostats_job = FALSE

    INFORMATICA CONFIGURATION FOR BETTER PERFORMANCE

    Informatica PowerCenter 8.6 32-bit vs. 64-bit

    32-bit OS memory can address only 2 ^ 32 bytes, or four gigabytes of RAM, and allow maximum two gigabytes for any

    application. Oracle BI Applications ETL mappings use complex Informatica transformations such as lookups, cached in

    memory, and their performance is heavily impacted by data from incremental extracts and high watermark

    warehousing volumes. Additionally BI Applications ETL execution plans employ parallel mappings execution. So 32-bit

    ETL tier can quickly exhaust the available memory and end up with very expensive I/O paging and swapping

    operations, thus causing rather dramatic regression in ETL performance.

    On the contrast, Informatica 64-bit takes the advantage of more physical RAM for performing complex transformations

    in memory and eliminating costly disk I/O operations. Informatica PowerCenter 8.6 provides a true 64-bit performance

    and the ability to scale because no intermediate staging or hashing files on disk are required for processing.

    The internal BI Applications ETL benchmarks for Informatica 8.6 32-bit vs. 64-bit showed at least two times better

    throughputs for 64-bit configuration. So, Oracle Business Intelligence Applications customers are strongly encouraged

    to use Informatica 8.6 64-bit version for Medium and Large environments.

    Informatica Session Logs

    Oracle BI Applications 7.9.6 uses Informatica PowerCenter 8.6, which has improved log reports. Each session log

    provides the detailed information about transformations as well as summary of a mapping execution, including the

    detailed percentage run time, idle time, etc.

    Below is an example of the execution summary from an Informatica session log:

    ***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ***** Thread [READER_1_1_1] created for [the read stage] of partition point [Sq_W_CUSTOMER_LOC_USE_DS] has completed. Total Run Time = [559.812502] secs Total Idle Time = [348.453112] secs Busy Percentage = [37.755389] Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [Sq_W_CUSTOMER_LOC_USE_DS] has completed. Total Run Time = [559.843748] secs Total Idle Time = [322.109055] secs Busy Percentage = [42.464472] Thread work time breakdown: Fil_W_CUSTOMER_LOC_USE_D: 2.105263 percent Exp_W_CUSTOMER_LOC_USE_D_Update_Flg: 10.526316 percent Lkp_W_CUSTOMER_LOC_USE_D: 13.684211 percent mplt_Get_Etl_Proc_Wid.EXP_Constant_for_Lookup: 1.052632 percent mplt_Get_Etl_Proc_Wid.Exp_Get_Integration_Id: 2.105263 percent mplt_Get_Etl_Proc_Wid.Exp_Decide_Etl_Proc_Wid: 3.157895 percent mplt_Get_Etl_Proc_Wid.LKP_ETL_PROC_WID: 20.000000 percent mplt_SIL_CustomerLocationUseDimension.Exp_Scd2_Dates: 44.210526 percent mplt_SIL_CustomerLocationUseDimension.Exp_W_CUSTOMER_LOC_USE_D_Transform: 3.157895 percent Thread [WRITER_1_*_1] created for [the write stage] of partition point [W_CUSTOMER_LOC_USE_D] has completed. Total Run Time = [561.171875] secs Total Idle Time = [0.000000] secs Busy Percentage = [100.000000]

    Busy Percentage for a single thread cannot be considered as an absolute measure of performance for a whole

    mapping. All threads statistics must be reviewed together. Informatica computes it for a single thread in a mapping as

    follows:

  • 15

    Busy Percentage = (Total Run Time Total Idle Time) / Total Run Time

    If the report log shows high Busy Percentage (> 70 - 80%) for the READER Thread, then you may need to review the

    mappings Reader Source Qualifier Query for any performance bottlenecks.

    If the report shows high Busy Percentage (> 60 - 70%) for the TRANSF Thread, then you need to review the detailed

    transformations execution summary and identify the most expensive transformation. In the example above the

    transformation mplt_SIL_CustomerLocationUseDimension.Exp_Scd2_Dates consumes 44.2% of all TRANSF runtime,

    so it may be considered a candidate for investigation.

    If the report shows high Busy Percentage for the WRITER Thread, it may not necessarily be a performance bottleneck.

    Depending on the processed data volumes, you may want to turn off Bulk Mode. Refer to the section Informatica

    Load: Bulk vs. Normal for more details.

    The log above shows that most probably the mapping is well balanced between Reader and Transformation threads

    and it keeps Writer busy with inserts.

    Informatica Lookups

    Too many Informatica Lookups in an Informatica mapping may cause significant performance slowdown. Review the

    guidelines below for handling Informatica Lookups in Oracle Business Intelligence Applications mappings:

    Inspect Informatica session logs for the number of lookups, including each lookups percentage runtime.

    Check Lookup table row count and Lookup cache row count numbers for each Lookup Transformation. If

    Lookup table row count is too high, Informatica will cache a smaller subset in its Lookup Cache. Such lookup

    could cause significant performance overhead on ETL tier.

    If functional logic permits, consider reducing a large lookup row count by adding more constraining predicates to

    the lookup query WHERE clause.

    If a Reader Source Qualifier query is not a bottleneck in a slow mapping, and the mapping is overloaded with

    lookups, consider pushing lookups with row counts less than two million into the Reader SQL as OUTER JOINS.

    Important: Some lookups could be reusable within a mapping or across multiple mappings, so they cannot be

    constrained or pushed down into Reader queries. Consult Oracle Development prior to re-writing Oracle

    Business Intelligence Applications mappings.

    If you identify a very large lookup with row count more than 15-20 million, consider pushing it down as an OUTER

    JOIN into the mappings Reader Query. Such update would slow down the Reader SQL execution, but it might

    improve overall mappings performance.

    Make sure you test the changes to avoid functional regressions before implementing optimizations in your

    production environment.

    Disabling Lookup Cache for very large Lookups

    Informatica uses Lookup cache to store the lookup data on the ETL tier in flat files (dat and idx). The Integration

    Service builds cache in memory when it processes the first row of data in the cached Lookup Transformation. If Lookup

    data is small, the lookup data can be stored in memory and transformation processes the rows very fast. But, if Lookup

    data is very large (typically over 20M), the lookup cannot fit into the allocated memory and the data has to be paged in

    and out many times during a single session. As a result, such lookup transformations adversely affect the overall

    mapping performance. Additionally Informatica takes more time to build such large lookups.

    If constraining a large lookup is not possible, then consider disabling the lookup cache. Connect to Informatica

    Workflow Manager, open the session properties, and find the desired transformation in the Transformations folder on

    the Mapping tab. Then uncheck Lookup Cache Enabled property and save the session.

  • 16

    Disabling the lookup cache for heavy lookups will help to avoid excessive paging on the ETL tier. When the lookup

    cache is disabled, the Integration Service issues a select statement against the lookup source database to retrieve

    lookup values for each row from the Reader Thread. It would not store any data in its flat files on ETL tier. The issued

    lookup query uses bind variables, so it is parsed only once in the lookup source database.

    Disabling lookup cache may work faster for very large lookups under following conditions:

    Lookup query must use index access path, otherwise data retrieval would be very expensive on the source lookup

    database tier. Remember that Informatica would fire the lookup query for every record from its Reader thread.

    Consider creating an index for all columns, which are used in the lookup query. Then Oracle Optimizer would

    choose INDEX FAST FULL SCAN to retrieve the lookup values from index blocks rather than scanning the whole

    table.

    Check the explain plan for the lookup query to ensure index access path.

    Make sure you test the modified mapping with the selected disabled lookups in a test environment and benchmark its

    performance prior to implementing the change in the production system.

    Joining Staging Tables to Lookup Tables in Informatica Lookups

    If you identify bottlenecks with lookups having very large rowcounts, you can consider constraining them by updating

    the Lookup queries and joining to a staging table used in the mapping. As a result, Informatica will execute the lookup

    query and cache much fewer rows, and speed up the rows processing on its Transformation thread.

    For example, the original query for Lkp_W_PARTY_D_With_Geo_Wid

    SELECT DISTINCT W_PARTY_D.ROW_WID as ROW_WID, W_PARTY_D.GEO_WID as GEO_WID, W_PARTY_D.INTEGRATION_ID as INTEGRATION_ID, W_PARTY_D.DATASOURCE_NUM_ID as DATASOURCE_NUM_ID, W_PARTY_D.EFFECTIVE_FROM_DT as EFFECTIVE_FROM_DT, W_PARTY_D.EFFECTIVE_TO_DT as EFFECTIVE_TO_DT FROM W_PARTY_D

    Can be modified to:

    SELECT DISTINCT W_PARTY_D.ROW_WID as ROW_WID, W_PARTY_D.GEO_WID as GEO_WID, W_PARTY_D.INTEGRATION_ID as INTEGRATION_ID, W_PARTY_D.DATASOURCE_NUM_ID as DATASOURCE_NUM_ID, W_PARTY_D.EFFECTIVE_FROM_DT as EFFECTIVE_FROM_DT, W_PARTY_D.EFFECTIVE_TO_DT as EFFECTIVE_TO_DT FROM W_PARTY_D, W_RESPONSE_FS WHERE W_PARTY_D.INTEGRATION_ID=W_RESPONSE_FS.PARTY_ID AND W_PARTY_D.DATASOURCE_NUM_ID=W_RESPONSE_FS.DATASOURCE_NUM_ID

    Such change ensured the lookup row count drop from > 22M to 180K and helped to improve the mapping

    performance.

    This approach can be applied selectively to both initial and incremental mappings after thorough benchmarks.

    Informatica Custom Relational Connections for long running mappings

    If you plan to summarize very large volumes of data (usually over 100 million records), you could speed up the large

    data ETL mappings by turning off automated PGA structures allocation and set SORT and HASH areas manually for

    the selected sessions.

  • 17

    To speed up such ETL mappings execution, set sort_area_size and hash_area_size to higher values. If you have

    limited system memory, you can increase only the sort_area_size as sorting operations for aggregate mappings are

    more memory intensive. Hash joins involving bigger tables can still perform better with smaller hash_area_size.

    Follow the steps below to create a new Relational Connection with custom session parameters in Informatica:

    1. Open Informatica Workflow Manager and navigate to Connections -> Relational -> New

    2. Define a new Target connection 'DataWarehouse_Manual_PGA'

    3. Use the same values as in DataWarehouse connection

    4. Click on Connection Environment SQL and insert the following commands:

    Repeat the same steps to define another custom Relational connection to your Oracle Source database.

    alter session set workarea_size_policy = manual;

    alter session set sort_area_size = 1000000000;

    alter session set hash_area_size = 2000000000;

    Each mapping that is a candidate to use the custom Relational connections, should meet the requirements below:

    - The mapping doesnt use heavy transformations on ETL tier

    - The Reader query joins very large tables

    - Its Reader query execution plan uses HASH JOINS

    Connect to Informatica Workflow Manager and complete the following steps for each identified mapping:

    1. Open a session in Task Developer

    2. Click on Mapping tab

    3. Select Connections in the left pane

    4. Select the defined Custom value for Source or Target connection

    5. Save the changes.

    Informatica Session Parameters

    There are three major properties, defined in Informatica Workflow Manager for each session, which impact Informatica

    mappings performance.

    Commit Interval

    The target-based commit interval determines the commit points at which the Integration Service commits data writes in

    the target database. The larger the commit interval, the better the overall mappings performance. However too large

    commit interval may cause database logs to fill and result in session failure.

    Oracle BI Applications Informatica mappings have the default setting 10,000. The recommended range for commit

    intervals is from 10,000 up to 200,000.

    DTM Buffer Size

    The DTM Buffer Size specifies the amount of memory the Integration Service uses for DTM buffer memory. Informatica

    uses DTM buffer memory to create the internal data structures and buffer blocks used to bring data into and out of the

    Integration Service.

  • 18

    Additional Concurrent Pipelines for Lookup Cache Creation

    Additional Concurrent Pipelines for Lookup Cache Creation parameter defines the concurrency for lookup cache

    creation.

    Oracle BI Applications Informatica mappings have the default setting 0. You can reduce lookup cache build time by

    enabling parallel lookup cache creation by setting the value larger than one.

    Important: Make sure you carefully analyze long running mapping bottlenecks before turning on lookup cache build

    concurrency in your production environment. Oracle BI Applications execution plans take advantage of parallel

    workflows execution. Enabling concurrent lookup cache creation may result in additional overhead on a target

    database and longer execution time.

    You can consider turning on lookup cache creation concurrency when you have one or two long running mappings,

    which are overloaded with lookups.

    Default Buffer Block Size

    The buffer block size specifies the amount of buffer memory used to move a block of data from the source to the

    target. Oracle BI Applications Informatica mappings have the default setting 128,000. Avoid using Auto value for

    Default Buffer Block Size, as it may cause performance regressions for your sessions.

    The internal tests showed better performance for both Initial and Incremental ETL with Default Buffer Block Size set to

    512,000 (512K). You can run the following SQL to update the Buffer Block Size to 512K for all mappings in your

    Informatica repository:

    SQL> update opb_cfg_attr set attr_value='512000' where attr_value='128000' and attr_id = 5; SQL> commit;

    Important: Make sure you test the changes in your development repository and benchmark ETL performance before

    making changes to your production environment.

    Informatica Load: Bulk vs. Normal

    The Informatica writer thread may become a bottleneck in some mappings that use bulk mode to load very large

    volumes (>200M) into a data warehouse.

    The analysis of a trace file from a Writer database session shows that Informatica uses direct path insert to load data in

    Bulk mode. The database session performs two direct path writes to insert each new portion of data. Every time Oracle

    scans for 12 contiguous blocks in a target table to perform a new write transaction. As the table grows larger, it takes

    longer and longer to scan the segment for chunks of 12 contiguous blocks. Even though it does bypass database block

    cache, the Informatica Writer thread may slow down the mappings overall performance.

    To determine whether your mapping, which loads very large data in bulk mode, slows down because of writer thread,

    open its Informatica session log, and compute the time to write the same set of blocks (usually 10,000) at the beginning

    and the end of the log. If you observe significant increase in the writer execution time at the end of the log, then you

    should consider either increasing commit size for the mapping or changing the session load mode from Bulk to Normal

    in Informatica Workflow Manager, and test the mapping with the updated setting.

    Informatica Bulk Load: Table Fragmentation

    Informatica Bulk Load for very large volumes may not only slow down the mapping performance but also cause

    significant table fragmentation.

    The internal tests showed that the commit size for Normal load did not affect the number of allocated extents for one

    million rows in W_RESPONSE_F fact, used in the internal benchmarks. However for the Bulk Load the number of

  • 19

    extents increased rather significantly with commit size going down. The commit size also affected the mapping

    performance for both Normal and Bulk load; the drop in throughput has been more significant for the latter scenario.

    The table below shows the number of extents (ext) and throughput (rps) for each tested scenario.

    Informatica Load

    type 1M commit 100K commit 10K commit 1K commit 10 rows commit

    Normal mode 80 ext / 34K rps 80 ext / 33K rps 80 ext / 30K rps 80 ext / 27K rps

    80 ext / 14K rps

    Bulk mode 80 ext / 55.5K rps

    190 ext / 55.5K rps

    200 ext / 37K rps

    960 ext / 8K rps

    > 5K ext (out of space) / 600 rps

    Important!!! To ensure bulk load performance and avoid or minimize target table fragmentation, make sure you set

    larger commit size in Informatica mappings.

    Use of NULL Ports in Informatica Mappings

    The use of connected or disconnected ports with hard-coded NULL values in Informatica mappings can be yet another

    reason for slower ETL mappings performance. The internal study showed that, depending on the number of NULL

    ports, such mappings performance can drop two times or even more. The performance gap becomes larger when

    more ports are used in a mapping. The session CPU time grows nearly proportionally to the number of connected

    ports, so does the row width, processed by Informatica. As soon as certain threshold of ports reached, the internal

    Informatica session processing for wide mappings becomes even more complex, and its execution runtime slows down

    dramatically. The internal tests demonstrated that Informatica treats equally NULL and non-NULL values and allocates

    critical resources for processing NULL ports. It also includes NULL values into INSERT statements, executed by

    WRITER thread on data warehouse tier.

    To ensure effective performance of Informatica mappings:

    - Avoid using NULL ports in Informatica transformations.

    - Try to keep the total number of ports no greater than 50 per mapping.

    - Review slow mappings for NULL ports or any other potentially redundant ports, which could be eliminated.

    Informatica Parallel Sessions Load on ETL tier

    Informatica mappings with complex transformations and heavy lookups typically consume larger amounts of memory

    during ETL execution. While processing large data volumes and executing in parallel, such mappings may easily

    overload the ETL server and cause very heavy memory swapping and paging. As the result, the overall ETL execution

    would take much longer time to complete. To avoid such potential bottlenecks:

    Consider implementing Informatica 64-bit version on your ETL tier.

    Ensure you have enough physical memory on your ETL tier server. Refer to Hardware Recommendations section

    for more details.

    Keep in mind that too many Informatica sessions, running in parallel, may overload either source or target

    database.

    Set smaller number of connections to Informatica Integration Service in DAC. Navigate to DACs Setup screen ->

    Informatica Servers tab -> Maximum Sessions in the lower pane for both Informatica and Repository connections.

    The recommended range is from 5 to 10 sessions.

    Benchmark your ETL performance in your test environment prior to implementing the change in the production

    system.

  • 20

    Informatica Load Balancing Implementation

    To improve the performance on the ETL tier, consider implementing Informatica Load Balancing to balance the

    Informatica load across multiple ETL tiers and speed up mappings execution. You can register one or more

    Informatica servers and the Informatica Repository Server in DAC and specify the number of workflows that can be

    executed in parallel. The DAC server automatically load balances across the servers. It does not run more sessions

    than the value specified for each of them.

    To implement Informatica Load Balancing in DAC perform the following steps.

    1. Register additional Informatica Server(s) in DAC. Refer to the section Registering Informatica Servers in the DAC

    Client in the publication Oracle Business Intelligence Applications Installation Guide for Informatica PowerCenter

    Users, Version 7.9.6

    2. Configure the database connection information in Informatica Workflow Manager. Refer to the section Process of

    Configuring the Informatica Repository in Workflow Manager in the publication Oracle Business Intelligence

    Applications Installation Guide for Informatica PowerCenter Users, Version 7.9.6.

    Important: Deploying multiple Informatica domains and repository services on different server nodes would cause

    additional maintenance overhead. Any repository updates or configuration changes, performed on one node, must be

    replicated across all the participating nodes in the multiple domains configuration.

    To minimize the overhead from Informatica repositories maintenance, consider the load balancing implementation

    below:

    Configure a single Informatica domain and deploy a single PowerCenter Repository service in it.

    Create Informatica services on each Informatica node and subscribe them to the single domain

    BITMAP INDEXES USAGE FOR BETTER QUERIES PERFORMANCE

    Introduction

    Oracle Business Intelligence Applications Version 7.9.0 introduced the use of the Bitmap Index feature of the Oracle

    RDBMS. In comparison with B-Tree indexes, Bitmap indexes provide significant performance improvements on data

    warehouse star queries. The internal benchmarks showed performance gains when B-Tree indexes on the foreign

    keys and attributes were replaced with bitmap indexes.

    Although bitmap indexes improve star queries response time, their use may cause ETL performance degradations both

    in Oracle 10g and 11g. Dropping all bitmap indexes on a large table prior to an ETL run, and then recreating them after

    the ETL completion may be quite expensive and time consuming. This is especially the case when there are a large

    number of such indexes, or when there is little change expected in the number of records updated or inserted into a

    table during each ETL run. Conversely, the quality of the existing bitmap indexes may degrade as more updates,

    deletes, and inserts are performed with indexes in place, making such indexes less effective unless they are rebuilt.

    This section reviews the index processing behavior of the DAC and provides the recommendations for bitmap indexes

    handling during ETL runs.

    DAC properties for handling bitmap indexes during ETL

    DAC handles the same indexes differently for initial and incremental ETL runs. Prior to an initial load in a data

    warehouse, there are no indexes created on the tables except for the unique B-Tree indexes to preserve data integrity.

    During the initial ETL run, DAC will create ETL indexes on a loaded table, which will be required for faster execution of

    subsequent mappings. For an incremental ETL run, DACs index handling will vary based on the combination of the

    several DAC properties and individual index usage settings.

  • 21

    The following table summarizes the list of parameters, available in DAC 10.1.3.4.1, to handle indexes during ETL runs:

    Parameter

    Name

    Parameter

    Type Values Effect

    Default

    Value

    Drop/Create

    Indices

    Execution

    Plan Y | N

    DAC will drop all indexes on a target table, truncated before a load, and then re-

    create them after loading the table. It is used mostly in small Execution plans.

    Initial ETL:

    - Y all indexes irrespective of any other settings will be dropped and created

    - N - no indexes will be dropped during an initial ETL

    Incremental ETL:

    - Y - indexes with Always Drop & Create (Bitmap) will be dropped during an

    incremental ETL

    - N - no indexes will be dropped during an incremental ETL

    DB2/390 customers may want to set it to N. The recommended default value for

    other platforms is Y, unless you are executing a micro ETL in which case it would

    be too expensive to drop and create all indexes, so the value should be changed to

    N.

    Important: When set to N, this parameter overrides all other index level properties.

    Y

    Always Drop

    & Create

    Bitmap

    Index Y | N

    The property Always Drop and Create is an index specific property, applicable to

    bitmap indexes only.

    - Y - a Bitmap index will be dropped prior to an ETL run.

    - N - a Bitmap index will not be dropped in an incremental ETL run only.

    The index property Always Drop & Create Bitmap does not override Drop/Create

    Indices execution plan property if the latter is set to N'. If an index is inactivated in

    DAC, the index would not be dropped and recreated during subsequent ETL runs.

    The property applies to Oracle data warehouse platform only.

    N/A

    Always Drop

    & Create Index Y | N

    The property Always Drop and Create is an index specific property, applicable to all

    indexes.

    - Y an index will be dropped prior to an ETL run.

    - N an index will not be dropped in an incremental ETL run only.

    The index property Always Drop & Create does not override Drop/Create Indices

    execution plan property if the latter is set to N'. If an index is inactivated in DAC, the

    index would not be dropped and recreated during subsequent ETL runs.

    N/A

    Index Usage Index ETL |

    QUERY

    - ETL - an index is required to improve subsequent ETL mappings performance.

    DAC drops ETL indexes on a table if it truncates the table before the load, or

    you set Drop/Create Indices, Always Drop and Create Bitmap or Always Drop &

    Create to True. DAC will re-create the dropped ETL indexes after loading the

    table, since the indexes will be used to speed up subsequent mappings.

    - Query - an index is required to improve web queries performance.

    N/A

  • 22

    Verify And

    Create Non-

    Existing

    Indices

    System True |

    False

    - True The DAC server will verify that all indexes defined in the DAC repository

    are created in the target database.

    - False - DAC will not run any reconciliation checks between its repository and

    the target database.

    This parameter is useful when the current execution plan has Drop/Create Indexes

    set to True, and new indexes have been created in the DAC repository since the

    last ETL run.

    False

    Num Parallel

    Indexes per

    Table

    Physical

    Data

    Source

    Number This parameter specifies the maximum number of indexes that the DAC server will

    create in parallel for a single table. 1

    Bitmap Indexes handling strategies

    Review the following recommendations for effective bitmap indexes management in your environment.

    1. Disable redundant bitmap indexes in DAC.

    Pre-packaged Oracle BI Applications releases include bitmap indexes, enabled in the DAC metadata repository, and

    therefore, created and maintained as part of ETL runs, even though the indexed columns might not be used in filtering

    conditions in the Oracle BI Server repository.

    Reducing the number of redundant bitmap indexes is an essential step for improving initial and incremental loads,

    especially for dimension and lookup tables. To identify all enabled BITMAP indexes on a table in DAC metadata

    repository:

    - Log in into your repository through the DAC user interface, click on the Design button under top menu, select your

    custom container in the pull down menu and select the Indices tab in the right pane.

    - Click Query sub-tab

    - Enter Table name and check Is Bitmap box in the query row and click Go.

    To identify the list of the exposed columns, included into filtering conditions in RPD repository, connect to BI Server

    Administration Tool and generate the list of dependencies for each column using Query Repository and Related To

    features.

    To disable the identified redundant indexes in DAC and drop them in Data Warehouse:

    - Check the Inactive checkbox against the indexes, which should be permanently dropped in the target schema.

    - Rebuild the DAC execution plan.

    - Connect to your target database schema and drop the disabled indexes.

    2. Decide whether to drop or keep bitmap indexes during incremental loads.

    Analyze the total time to build indexes and computing statistics during an incremental run. You can connect to your

    DAC repository and execute the following queries:

    SQL> alter session set nls_date_format='DD-MON-YYYY:HH24:MI:SS';

    -- Identify your ETL Run and put its format into the subsequent queries:

    select ROW_WID, NAME ETL_RUN , EXTRACT(DAY FROM (END_TS - START_TS) DAY TO SECOND ) || ' days ' || EXTRACT(HOUR FROM (END_TS - START_TS) DAY TO SECOND ) || ' hrs ' || EXTRACT(MINUTE FROM (END_TS - START_TS) DAY TO SECOND ) || ' min ' || EXTRACT(SECOND FROM (END_TS - START_TS) DAY TO SECOND ) || ' sec ' PLAN_RUN_TIME

  • 23

    from W_ETL_DEFN_RUN order by START_TS DESC; -- Identify your custom Execution Plan Name: SELECT DISTINCT app.row_wid FROM w_etl_defn_run run , w_etl_app app , w_etl_defn_prm prm WHERE prm.etl_defn_wid = run.etl_defn_wid AND prm.app_wid = app.row_wid AND run.row_wid = '; -- Indexes build time: SELECT ref_idx.tbl_name table_name , ref_idx.idx_name , sdtl.start_ts start_time , sdtl.end_ts end_time , EXTRACT(DAY FROM(sdtl.end_ts - sdtl.start_ts) DAY TO SECOND) || ' days ' || EXTRACT(HOUR FROM(sdtl.end_ts - sdtl.start_ts) DAY TO SECOND) || ' hrs ' || EXTRACT(MINUTE FROM(sdtl.end_ts - sdtl.start_ts) DAY TO SECOND) || ' min ' || EXTRACT(SECOND FROM(sdtl.end_ts - sdtl.start_ts) DAY TO SECOND) || ' sec' idx_bld_time FROM w_etl_defn_run def , w_etl_run_step stp , w_etl_run_sdtl sdtl , (SELECT ind_ref.obj_wid , ind.name idx_name , tbl.name tbl_name FROM w_etl_index ind , w_etl_obj_ref ind_ref , w_etl_obj_ref tbl_ref , w_etl_table tbl , w_etl_app app WHERE ind_ref.obj_type = 'W_ETL_INDEX' AND ind_ref.soft_del_flg = 'N' AND ind_ref.app_wid = AND ind_ref.obj_wid = ind.row_wid AND tbl_ref.obj_type = 'W_ETL_TABLE' AND tbl_ref.soft_del_flg = 'N' AND tbl_ref.app_wid = AND tbl_ref.obj_wid = tbl.row_wid AND tbl_ref.obj_ref_wid = ind.table_wid AND ind.app_wid = app.row_wid AND ind.inactive_flg = 'N' ) ref_idx WHERE def.row_wid = stp.run_wid AND def.row_wid =' AND sdtl.run_step_wid = stp.row_wid AND sdtl.type_cd = 'Create Index' AND sdtl.index_wid = ref_idx.obj_wid -- AND ref_idx.tbl_name = 'W_OPTY_D' ORDER BY sdtl.end_ts - sdtl.start_ts DESC -- Table Stats computing time: select TBL.NAME TABLE_NAME , STP.STEP_NAME , EXTRACT(DAY FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' days ' || EXTRACT(HOUR FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' hrs ' || EXTRACT(MINUTE FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' min ' || EXTRACT(SECOND FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' sec' TBL_STATS_TIME from W_ETL_DEFN_RUN DEF , W_ETL_RUN_STEP STP , W_ETL_RUN_SDTL SDTL

  • 24

    , W_ETL_TABLE TBL where DEF.ROW_WID=STP.RUN_WID and DEF.ROW_WID =' and SDTL.RUN_STEP_WID = STP.ROW_WID and SDTL.TYPE_CD = 'Analyze Table' and SDTL.TABLE_WID = TBL.ROW_WID order by SDTL.END_TS - SDTL.START_TS desc; -- Informatica jobs for the selected ETL run: select SDTL.NAME SESSION_NAME , SDTL.SUCESS_ROWS , STP.FAILED_ROWS , SDTL.READ_THRUPUT , SDTL.WRITE_THRUPUT , EXTRACT(DAY FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' days ' || EXTRACT(HOUR FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' hrs ' || EXTRACT(MINUTE FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' min ' || EXTRACT(SECOND FROM (SDTL.END_TS - SDTL.START_TS) DAY TO SECOND ) ||' sec' INFA_RUN_TIME from W_ETL_DEFN_RUN DEF , W_ETL_RUN_STEP STP , W_ETL_RUN_SDTL SDTL where DEF.ROW_WID=STP.RUN_WID and DEF.ROW_WID =' and SDTL.RUN_STEP_WID = STP.ROW_WID and SDTL.TYPE_CD = 'Informatica' order by SDTL.END_TS - SDTL.START_TS desc;

    If the report shows significant amounts of time to rebuild indexes and compute statistics, and the cumulative incremental load time does not fit into your load window, you can consider two options:

    Option 1: range partition large fact tables if they show up in the report. Refer to the partitioning sections for more

    details.

    Option 2: If the incremental volumes are low, leave bitmap indexes on the reported tables for the next incremental run

    and then compare the load times. Refer to the next chapter for the implementation.

    Option 2 is not recommended for fact tables (%_F). It may be used for large dimension tables, which cannot be

    partitioned effectively by range.

    Important: Bitmap indexes present on target tables during inserts, updates or deletes could significantly

    increase the SQL DML execution time. The same SQL would complete much faster if the indexes get dropped

    prior to the query execution. Alternatively, it would take more time to rebuild the dropped bitmap indexes and

    compute required statistics. You should measure the cumulative time to run a specific task plus the time to

    rebuild indexes and compute required database statistics before deciding whether to drop or keep bitmap

    indexes in place during incremental loads.

    3. Configure DAC not to drop selected bitmap indexes during incremental loads.

    If your benchmarks show that it is less time consuming to leave bitmap indexes in place on large dimension tables

    during incremental loads and the incremental volumes are relatively small, then you can consider keeping the selected

    indexes in place during incremental loads.

    Since the DAC system property Drop and Create Bitmap Indexes Always overrides the index property Always Drop &

    Create, the system property defines how DAC will handle all bitmap indexes for all containers in the data warehouse

    schema. To workaround this limitation:

  • 25

    Log in into your repository through DAC user interface, click on the Design button under the top menu, and select

    the Indices tab in the right pane.

    Click on the Query sub-tab and get the list of all indexes defined on the target table.

    Check both the check boxes Always Drop & Create and Inactive against the indexes, which should not be dropped

    during incremental runs.

    Important: You must uncheck the Inactive checkbox for these indexes before the next initial load; otherwise

    DAC will not create them after the initial load completion. Since the Inactive property is used both for true

    inactive indexes and "hidden from incremental load" indexes, the property Always Drop & Create could be used

    for convenience to distinguish between two different categories.

    If you choose to keep some bitmap indexes in place during incremental runs, consider creating the indexes with the

    storage parameter PCTFREE value to at least 50 or higher. Oracle RDBMS packs bitmap indexes in a data block

    much more tightly compared to B*Tree indexes. When an update, insert, or delete occurs on table columns with

    enabled indexes, the bitmap indexes quality will degrade. The higher value of PCTFREE will mitigate the impact to

    some degree.

    4. Additional considerations for handling bitmap indexes during incremental loads.

    - All bitmap indexes should be dropped for transaction fact tables, with over 20 million records, that will have a large

    volume of data updates and inserts, such as over 0.5 1 percent of total records, during an incremental run.

    - For large tables with a small number of bitmap indexes, consider dropping and recreating the bitmap indexes since

    the time to rebuild would be short.

    - For large tables with few data updates, the indexes can be enabled during incremental runs without significant

    performance degradations.

    Disabling Indexes with DISTINCT_KEYS = 0 or 1

    Oracle BI Applications delivers a number of indexes to optimize both ETL and end user queries performance.

    Depending on end user data and its distribution there may be some indexes on columns with just one distinct value.

    Such indexes will not be used in any queries, so they can be safely dropped in your Data Warehouse schema and

    disabled in DAC repository.

    The following script helps to identify all such indexes, disable them in DAC repository and drop in database. You have

    to either connect as DBA user or implement additional grants, since the script requires access to two database

    schemes:

    ACCEPT DAC_OWNER PROMPT 'Enter DAC Repository schema name: ' ACCEPT DWH_OWNER PROMPT 'Enter Data Warehouse schema name: ' SELECT row_wid FROM "&&DAC_OWNER".w_etl_app; ACCEPT APP_ID PROMPT 'Enter your DAC container from the list above: ' UPDATE "&&DAC_OWNER".w_etl_index SET inactive_flg = 'Y' WHERE row_wid IN ( SELECT ind_ref.obj_wid FROM "&&DAC_OWNER".w_etl_index ind, "&&DAC_OWNER".w_etl_obj_ref ind_ref, "&&DAC_OWNER".w_etl_obj_ref tbl_ref, "&&DAC_OWNER".w_etl_table tbl, "&&DAC_OWNER".w_etl_app app, all_indexes all_ind WHERE ind_ref.obj_type = 'W_ETL_INDEX' AND ind_ref.soft_del_flg = 'N' AND ind_ref.app_wid = '&&APP_ID' AND ind_ref.obj_wid = ind.row_wid AND tbl_ref.obj_type = 'W_ETL_TABLE'

  • 26

    AND tbl_ref.soft_del_flg = 'N' AND tbl_ref.app_wid = '&&APP_ID' AND tbl_ref.obj_wid = tbl.row_wid AND tbl_ref.obj_ref_wid = ind.table_wid AND ind.app_wid = app.row_wid AND ind.inactive_flg = 'N' AND all_ind.index_name = ind.name AND all_ind.table_name = tbl.name AND all_ind.distinct_keys = 1 -- AND ind.type_cd = 'Query' AND all_ind.owner = '&&DWH_OWNER'); COMMIT; -- Drop the indexes in the schema: spool drop_dist_indexes.sql SELECT 'DROP INDEX ' || owner|| '.' || index_name || ' ;' FROM all_indexes WHERE distinct_keys

  • 27

    %2 ON %3 ( %4 ) NOLOGGING'; execute immediate 'ALTER INDEx %2 MONITORING USAGE'; END; BEGIN execute immediate 'CREATE %1 INDEX %2 ON %3 ( %4 ) NOLOGGING'; execute immediate 'ALTER INDEX %2 MONITORING USAGE'; END; BEGIN execute immediate 'CREATE %1 INDEX %2 ON %3 ( %4 ) NOLOGGING PARALLEL'; execute immediate 'ALTER INDEX %2 MONITORING USAGE'; END;

    6. If you implement index monitoring for the first time after completing ETLs, execute the following PL/SQL block

    to enable monitoring for all indexes:

    DECLARE CURSOR c1 IS SELECT index_name FROM user_indexes WHERE index_name NOT IN (SELECT index_name FROM v$object_usage WHERE MONITORING = 'YES'); BEGIN FOR rec IN c1 LOOP EXECUTE IMMEDIATE 'alter index '||rec.index_name||' monitoring usage'; END LOOP; END;

    /

    To query the unused indexes in your data warehouse execute the following SQL:

    SELECT DISTINCT index_name FROM myobj_usage WHERE used = 'NO';

    Important!!! There are two known cases when optimizer uses indexes but DOES NOT mark as used with Index Usage Monitoring turned on:

    DML operations against Parent table (such as DELETE or UPDATE), associated with a Child table via the child table Foreign Key (FK) and the FK Normal Index on the Child table, do use the Child table FK index, but Oracle does not report them as used in v$object_usage. Note that BITMAP indexes are correctly flagged as used in the same scenario and reported in v$object_usage.

  • 28

    Optimizer may use extended statistics for computing correct table selectivity, using composite indexes, and yet, not report them in v$object_usage. Such case may not be a critical one for BI Analytics warehouse, since it doesnt use composite BITMAP indexes, while composite NORMAL indexes are used on surrogate keys (unique indexes) and critical columns, used in ETL or OBIEE queries.

    Make sure you carefully review the reported unused indexes prior to dropping them in the database and disabling in DAC repository.

    After identifying redundant indexes, disabling them in DAC and dropping in your data warehouse, follow the steps

    below to turn off index monitoring:

    1. Restore /bifoundation/dac/CustomSQLs/CustomSQL.xml from its backup copy.

    2. Reset "Script before every ETL" System parameter in DAC

    3. Execute the following PL/SQL block to disable index monitoring:

    DECLARE CURSOR c1 IS SELECT index_name FROM user_indexes WHERE index_name IN (SELECT index_name FROM v$object_usage WHERE MONITORING = 'YES'); BEGIN FOR rec IN c1 LOOP EXECUTE IMMEDIATE 'alter index '||rec.index_name||' nomonitoring usage'; END LOOP; END;

    Important!!! Make sure you monitor the index usage for the extended period of at least 1-2 months before deciding which additional indexes could be disabled in DAC and dropped in your target schema.

    Handling Query Indexes during Initial ETL

    Oracle BI Applications delivers a number of query indexes, which are not used during ETL but required for OBIEE

    queries better performance. Most of query indexes are created as BITMAP indexes in Oracle database. Creation of

    such large number of query indexes can extend both initial and incremental ETL windows. This article discusses

    several options how to reduce index maintenance such as disabling unused query indexes, or partitioning large fact

    tables and maintain local query indexes on the latest range partitions.

    You can consider disabling ALL query indexes and reduce your ETL runtime in the following scenarios:

    1. Disable query indexes -> run an initial ETL -> enable query indexes -> run an incremental ETL -> run OBIEE

    reports

    2. Disable query indexes -> run an incremental ETL -> enable query indexes -> run another incremental ETL ->

    run OBIEE reports

    To summarize, you can disable query indexes only for the following pattern: 1st ETL > 2

    nd ETL > OBIEE. You

    cannot use this option for 1st ETL > OBIEE > 2

    nd ETL sequence.

    Important: If you plan to implement partitioning for your warehouse tables and you want to take advantage of

    conversion scripts in the next section, then you need to have query indexes, created on the target tables prior to

    implementing partitioning.

    Identify and preserve all activated query indexes PRIOR to executing the first ETL run:

    CREATE TABLE psr_initial_query_idx AS SELECT ind_ref.obj_wid, ind.NAME idx_name, tbl.NAME tbl_name FROM w_etl_index ind, w_etl_obj_ref ind_ref, w_etl_obj_ref tbl_ref,

  • 29

    w_etl_table tbl, w_etl_app app WHERE ind_ref.obj_type = 'W_ETL_INDEX' AND ind_ref.soft_del_flg = 'N' AND ind_ref.app_wid = :APP_ID AND ind_ref.obj_wid = ind.row_wid AND tbl_ref.obj_type = 'W_ETL_TABLE' AND tbl_ref.soft_del_flg = 'N' AND tbl_ref.app_wid = :APP_ID AND tbl_ref.obj_wid = tbl.row_wid AND tbl_ref.obj_ref_wid = ind.table_wid AND ind.app_wid = app.row_wid AND ind.inactive_flg = 'N' AND ind.isunique = 'N' AND ind.type_cd = 'Query' AND (ind.DRP_CRT_ALWAYS_FLG = 'Y' OR ind.DRP_CRT_BITMAP_FLG = 'Y')

    Where APP_ID can be identified from:

    SELECT row_wid FROM w_etl_app;

    Disable the identified query indexes PRIOR to starting the first ETL run:

    SQL> UPDATE w_etl_index SET inactive_flg = 'Y' WHERE row_wid IN (SELECT obj_wid FROM psr_initial_query_idx);

    SQL> commit; Execute your first ETL run.

    Enable all preserved indexes PRIOR to starting the second ETL run:

    SQL> UPDATE w_etl_index SET inactive_flg = 'N' WHERE row_wid IN (SELECT obj_wid FROM psr_initial_query_idx);

    SQL> commit; Execute your second ETL run. DAC will recreate all disabled query indexes.

    PARTITIONING GUIDELINES FOR LARGE FACT TABLES

    Introduction

    Taking advantage of range and composite range-range partitioning for fact tables will not only reduce index and

    statistics maintenance time during ETL, but also improve web queries performance. Since the majority of inserts and

    updates impact the last partition(s), you would need to disable only local indexes on a few impacted partitions, and then

    rebuild disabled indexes after the load and compute statistics on updated partitions only. Online reports and

    dashboards should also render results faster, since the optimizer would build more efficient execution plans using

    partitions elimination logic.

    Large fact tables, with more than 20 million rows, are good candidates for partitioning. To build an optimal partitioned

    table with reasonable data distribution, you can consider partitioning by month, quarter, year, etc. You can either

    identify and partition target fact tables before initial run, or convert the populated tables into partitioned objects after the

    full load.

    To implement the support for partitioned tables in Oracle Business Analytics Data Warehouse, you need to update

    DAC metadata and manually convert the candidates into partitioned tables in the target database.

    Follow the steps below to implement fact table partitioning in your data warehouse schema and DAC repository. Please

    note that there are some steps, which apply for composite range-range partitioning only.

  • 30

    Convert to partitioned tables

    Perform the following steps to convert a regular table into an range partitioned table.

    Identify a partitioning key and decide on a partitioning interval

    Choosing the correct partitioning key is the most important factor for effective partitioning, since it defines how many

    partitions will be involved in web queries or ETL updates. Review the following guidelines for selecting a column for a

    partitioning key:

    Identify eligible columns of type DATE for implementing range partitioning.

    Connect to the Oracle BI Server repository and check the usage or dependencies on each column in the

    logical and presentation layers.

    Analyze the summarized data distribution in the target table by each potential partitioning key candidate and

    data volumes per time range, month, quarter or year.

    Basing on the compiled data, decide on the appropriate partitioning key and partitioning range for your future

    partitioned table.

    The recommended partitioning range for most implementations is a month, though you can consider a quarter

    or a year for your partitioning ranges.

    The proposed partitioning guidelines assume that the majority of incremental ETL volume data (~90%) is new records,

    which end up in the one or two latest partitions. Depending on the chosen range granularity, you may consider

    rebuilding local indexes for the most impacted latest partitions:

    - Monthly range: you are advised to maintain two