5-Dwh-capacity Planing & Load
description
Transcript of 5-Dwh-capacity Planing & Load
CAPACITY PLANNING ,TUNING AND TESTING
Capacity planning:The capacity plan for a DWH is defined within the technical blueprint stage of the delivery process.For each user or a group of users the you need to know the following:
•The number of users in the group,Whether they use adhoc queries frequently.
•Whether they use adhoc queries occasionally at unknown intervals
•Whether they use adhoc queries occasionally at regular and predictable times
•The average size of query they tend to run
•The maximum size of the query they tend to run
•The elapsed login time per day,The peak time of daily usage.
•The number of queries they run for peak hour,The number of queries they run per day
ESTIMATING THE LOAD:
There are a number of different elements that need to be considered but the decisions all come down to how much CPU,how much memory and how mush disk you will need.
Do not allow cost or budget considerations to effect capacity estimates.
INITIAL CONFIGURATION:
All you can do is estimate the configuration based on the known requirements.this is why the business requirements phase is so important.
HOW MUCH CPU BANDWIDTH ?
The load can be divided into two distinct phases
• daily processing:user query processing
• overnight processing:data transformation and load ,aggregation and index creation,backup
Daily processing:
The first thing to do is estimate the size of the largest likely common query.having established the likely period that will be queried you will know the volume of data that will be involved.
Have a measure of volume of data,lets say F megabytes that will be accessed :T=F/S (T is time in secs to perform a full table scan of the period in question,S is the scan rate)
Overnight processing:
How mush memory?
First,database requirements,secondly each user connected to the system will use an amount of memory ,finally the operating system will require an amount of memory.
How much disk?
The disk requirements can be broken down into following categories:
•Database requirements
•Administration
•Fact and dimension data
•Aggregations
•Non-database requirements
•Operating system requirements
•Other s/w requirements
•DWH requirements
•User requirements
Database sizing:do not attempt to size the database until the database schema is complete.
Allow for period variations when sizing the partitions.
Use also need to size the aggregations and allow space for indexes on the aggregations.
TUNING THE DWHTuning the DWH is more difficult than tuning an OLTP environment because of the adhoc and unpredictable nature of the load.
Assesing performance:
Before you can tune the DWH you must have some objective measure of performance to work with.
•Avergage query response times
•Scan rates
•I/O throughput rates
•Time used for query
•Memory usage for process
Ensure that every one,from the DWH designers to the users and senior management,has reasonable performance expectations.
TUNING THE DATA LOAD:
Crucial part of overnight processing.integrity checking is needed.
Limit to the absolute minimum required the number of integrity checks that need to be applied on the DWH where possible,integrity checks should be performed on the source systems.
Spread the load source files and the load destination files to avoid any I/O bottlenecks
TUNING QUERIES:
The DWH consists of two types of queries 1.fixed queries:store the expected execution plans for known queries
2.adhoc queries:for each user or group of users you need to know the following:
•Number of users in the group
•Whether they use adhoc queries occasionally at unknown intervals,Whether they use adhoc queries occasionally at regular intervals and predictable times
•The average size of query they tend to run,The maximum size of query they tend to run
•The elapsed login time per day,The peak time of daily usage
•The number of queries they run per peak hour
•Whether they require drill down access to the base data
You need to turn an unpredictable query mix into a predictable query mix
Get queries to run against aggregations rather than against the base data itself
To measure usage profiles for each index and aggregation, need answers to following questions:
•How many different queries run against an aggregation table?
•How often does each of these queries run?
•What indexes on an object are used most frequently?
•What queries use each index , and how would they be affected if the index did not exist?
The query manager should grab and maintain the following data:
•Query syntax
•Query execution plan
•CPU resource used
•Memory resource used
•I/O resource used
•Query elapsed time
•How frequently query is run
TESTING THE DWH:
There are three basic levels of testing:
•Unit testing: each development unit is tested on its own.
•Integration testing: the separate development units that make up a component of the DWH application are tested to ensure that they work together.
•System testing:the whole data warehouse application is tested together.
•Sufficient testing needs to be performed to establish that queries scale with the data
•Full-scale testing requires a comprehensive test plan.
DEVELOPING THE TEST PLAN
Test schedule:
As a rule of thumb we suggest that you too apply your normal metrics for estimating the amount of time required for testing.
Double the amount of time you would normally allow for testing.
Data load:
Where is the test data going to come from?will the data be generated?if the data has to be generated a number of issues need to be considered such as
•How will the data be generated?
•Where will be the data generated?
•How will the generated data be loaded?
•Will the data be correctly skewed?
If the tset data is generated ensure the data has the correct balance and skew.make sure the ratio of fact to dimension is correct and so on.
TESTING BACKUP RECOVERY:
It is one of the key test.each of the following scenarios needs to be catered for.
•Instance failure,Media failure
•Loss or damage of table space or data file ,Loss or damage of table
•Loss or damage of a redo log file,Loss or damage of archive log file
•Loss or damage of control file,Failure during data movement
•Any other scenarios.
Schedule the recovry test last,because a fail tast can cause considerable delays.
Every backup test should be verified by performing recovery using the backed up data.
TESTING THE OPERATIONAL ENVIRONMENT
It is another key test .there are number of aspects that need to be tested.
•Security:is difficult to test unless you have clearly documented what Is not allowed.Disk configuration:should be tested thoroughly to identify any potential I/O bottlenecks.Scheduler:this needs to be thoroughly tested during the system testing.
•Management tools: the tools that are going to be used to operate the DWH are:
•Event manager
•System manager
•Configuration manager
•Back up and recovery manager
•Database manager
•Event manager actually does track and report the events that are required, events such as:
•Running out of space on certain key disks
•A processing dying
•A process using excessive resource
•A process running with error
•Disks exhibiting I/O bottlenecks
•Hardware failures
Many of the tools such as the system manager and the backup recovery manager, are the best tested by their use during the system tests
•Database management
TESTING THE DATABASE:
The testing the database can be broken down into three separate sets of tests:
•Testing the database manager and monitoring tools:
These tasks should be carried out by the DBAs who will be
Running live system.
•Testing database features:
Features such as following need specific attention
•Querying in parallel
•Create index in parallel
•Data load in parallel
Testing database performance:
Take the time to test the most complex and awkward queries that the business is likely to ask against different index and aggregation strategies.
Testing the application:
Testing the load, warehouse and query managers is just traditional testing of developed code. The main thing is to test that all the managers integrate correctly, and to ensure that the end-to-end load, index,aggregate and query works as expected.
Logistics of testing:
One of the key question to answer about system test is:how big a system should you test on? The aim of the system testing is to test all of the following areas:
•DWH application code
•Day-to-day operational procedures
•Overnight processing
•Backup and recovery strategy
•Query performance
•Management and monitoring tools
•Scheduling software
Ensure that the DWH design scales as the data scales.
A passive monitoring tool works by taking snapshots of statistics at various stages.
An active monitoring tool gathers data continuously.
Test every thing twice.