5-Dwh-capacity Planing & Load

18
CAPACITY PLANNING ,TUNING AND TESTING Capacity planning:The capacity plan for a DWH is defined within the technical blueprint stage of the delivery process.For each user or a group of users the you need to know the following: •The number of users in the group,Whether they use adhoc queries frequently. •Whether they use adhoc queries occasionally at unknown intervals •Whether they use adhoc queries occasionally at regular and predictable times •The average size of query they tend to run •The maximum size of the query they tend to run •The elapsed login time per day,The peak time of daily usage. •The number of queries they run for peak hour,The

description

dwh

Transcript of 5-Dwh-capacity Planing & Load

Page 1: 5-Dwh-capacity Planing & Load

CAPACITY PLANNING ,TUNING AND TESTING

Capacity planning:The capacity plan for a DWH is defined within the technical blueprint stage of the delivery process.For each user or a group of users the you need to know the following:

•The number of users in the group,Whether they use adhoc queries frequently.

•Whether they use adhoc queries occasionally at unknown intervals

•Whether they use adhoc queries occasionally at regular and predictable times

•The average size of query they tend to run

•The maximum size of the query they tend to run

•The elapsed login time per day,The peak time of daily usage.

•The number of queries they run for peak hour,The number of queries they run per day

Page 2: 5-Dwh-capacity Planing & Load

ESTIMATING THE LOAD:

There are a number of different elements that need to be considered but the decisions all come down to how much CPU,how much memory and how mush disk you will need.

Do not allow cost or budget considerations to effect capacity estimates.

INITIAL CONFIGURATION:

All you can do is estimate the configuration based on the known requirements.this is why the business requirements phase is so important.

HOW MUCH CPU BANDWIDTH ?

The load can be divided into two distinct phases

• daily processing:user query processing

• overnight processing:data transformation and load ,aggregation and index creation,backup

Page 3: 5-Dwh-capacity Planing & Load

Daily processing:

The first thing to do is estimate the size of the largest likely common query.having established the likely period that will be queried you will know the volume of data that will be involved.

Have a measure of volume of data,lets say F megabytes that will be accessed :T=F/S (T is time in secs to perform a full table scan of the period in question,S is the scan rate)

Overnight processing:

How mush memory?

First,database requirements,secondly each user connected to the system will use an amount of memory ,finally the operating system will require an amount of memory.

Page 4: 5-Dwh-capacity Planing & Load

How much disk?

The disk requirements can be broken down into following categories:

•Database requirements

•Administration

•Fact and dimension data

•Aggregations

•Non-database requirements

•Operating system requirements

•Other s/w requirements

•DWH requirements

•User requirements

Page 5: 5-Dwh-capacity Planing & Load

Database sizing:do not attempt to size the database until the database schema is complete.

Allow for period variations when sizing the partitions.

Use also need to size the aggregations and allow space for indexes on the aggregations.

Page 6: 5-Dwh-capacity Planing & Load

TUNING THE DWHTuning the DWH is more difficult than tuning an OLTP environment because of the adhoc and unpredictable nature of the load.

Assesing performance:

Before you can tune the DWH you must have some objective measure of performance to work with.

•Avergage query response times

•Scan rates

•I/O throughput rates

•Time used for query

•Memory usage for process

Ensure that every one,from the DWH designers to the users and senior management,has reasonable performance expectations.

Page 7: 5-Dwh-capacity Planing & Load

TUNING THE DATA LOAD:

Crucial part of overnight processing.integrity checking is needed.

Limit to the absolute minimum required the number of integrity checks that need to be applied on the DWH where possible,integrity checks should be performed on the source systems.

Spread the load source files and the load destination files to avoid any I/O bottlenecks

TUNING QUERIES:

The DWH consists of two types of queries 1.fixed queries:store the expected execution plans for known queries

Page 8: 5-Dwh-capacity Planing & Load

2.adhoc queries:for each user or group of users you need to know the following:

•Number of users in the group

•Whether they use adhoc queries occasionally at unknown intervals,Whether they use adhoc queries occasionally at regular intervals and predictable times

•The average size of query they tend to run,The maximum size of query they tend to run

•The elapsed login time per day,The peak time of daily usage

•The number of queries they run per peak hour

•Whether they require drill down access to the base data

Page 9: 5-Dwh-capacity Planing & Load

You need to turn an unpredictable query mix into a predictable query mix

Get queries to run against aggregations rather than against the base data itself

To measure usage profiles for each index and aggregation, need answers to following questions:

•How many different queries run against an aggregation table?

•How often does each of these queries run?

•What indexes on an object are used most frequently?

•What queries use each index , and how would they be affected if the index did not exist?

Page 10: 5-Dwh-capacity Planing & Load

The query manager should grab and maintain the following data:

•Query syntax

•Query execution plan

•CPU resource used

•Memory resource used

•I/O resource used

•Query elapsed time

•How frequently query is run

Page 11: 5-Dwh-capacity Planing & Load

TESTING THE DWH:

There are three basic levels of testing:

•Unit testing: each development unit is tested on its own.

•Integration testing: the separate development units that make up a component of the DWH application are tested to ensure that they work together.

•System testing:the whole data warehouse application is tested together.

•Sufficient testing needs to be performed to establish that queries scale with the data

•Full-scale testing requires a comprehensive test plan.

Page 12: 5-Dwh-capacity Planing & Load

DEVELOPING THE TEST PLAN

Test schedule:

As a rule of thumb we suggest that you too apply your normal metrics for estimating the amount of time required for testing.

Double the amount of time you would normally allow for testing.

Data load:

Where is the test data going to come from?will the data be generated?if the data has to be generated a number of issues need to be considered such as

•How will the data be generated?

•Where will be the data generated?

•How will the generated data be loaded?

•Will the data be correctly skewed?

If the tset data is generated ensure the data has the correct balance and skew.make sure the ratio of fact to dimension is correct and so on.

Page 13: 5-Dwh-capacity Planing & Load

TESTING BACKUP RECOVERY:

It is one of the key test.each of the following scenarios needs to be catered for.

•Instance failure,Media failure

•Loss or damage of table space or data file ,Loss or damage of table

•Loss or damage of a redo log file,Loss or damage of archive log file

•Loss or damage of control file,Failure during data movement

•Any other scenarios.

Schedule the recovry test last,because a fail tast can cause considerable delays.

Every backup test should be verified by performing recovery using the backed up data.

Page 14: 5-Dwh-capacity Planing & Load

TESTING THE OPERATIONAL ENVIRONMENT

It is another key test .there are number of aspects that need to be tested.

•Security:is difficult to test unless you have clearly documented what Is not allowed.Disk configuration:should be tested thoroughly to identify any potential I/O bottlenecks.Scheduler:this needs to be thoroughly tested during the system testing.

•Management tools: the tools that are going to be used to operate the DWH are:

•Event manager

•System manager

•Configuration manager

•Back up and recovery manager

•Database manager

Page 15: 5-Dwh-capacity Planing & Load

•Event manager actually does track and report the events that are required, events such as:

•Running out of space on certain key disks

•A processing dying

•A process using excessive resource

•A process running with error

•Disks exhibiting I/O bottlenecks

•Hardware failures

Many of the tools such as the system manager and the backup recovery manager, are the best tested by their use during the system tests

•Database management

Page 16: 5-Dwh-capacity Planing & Load

TESTING THE DATABASE:

The testing the database can be broken down into three separate sets of tests:

•Testing the database manager and monitoring tools:

These tasks should be carried out by the DBAs who will be

Running live system.

•Testing database features:

Features such as following need specific attention

•Querying in parallel

•Create index in parallel

•Data load in parallel

Page 17: 5-Dwh-capacity Planing & Load

Testing database performance:

Take the time to test the most complex and awkward queries that the business is likely to ask against different index and aggregation strategies.

Testing the application:

Testing the load, warehouse and query managers is just traditional testing of developed code. The main thing is to test that all the managers integrate correctly, and to ensure that the end-to-end load, index,aggregate and query works as expected.

Logistics of testing:

One of the key question to answer about system test is:how big a system should you test on? The aim of the system testing is to test all of the following areas:

Page 18: 5-Dwh-capacity Planing & Load

•DWH application code

•Day-to-day operational procedures

•Overnight processing

•Backup and recovery strategy

•Query performance

•Management and monitoring tools

•Scheduling software

Ensure that the DWH design scales as the data scales.

A passive monitoring tool works by taking snapshots of statistics at various stages.

An active monitoring tool gathers data continuously.

Test every thing twice.