Turn up the volume - Teradataapps.teradata.com/tdmo/v08n04/pdf/AR5821.pdf · database tuning...

3
WHY TERADATA T he widespread adoption of parallel processing has been a boon for data warehousing. It has allowed flexible analysis of increasingly more granular data so busi- nesses can optimize the organization’s strat- egy and enhance its operational execution. The demand on a data warehouse will continue to expand and evolve, requiring support for a diverse body of analytics and applications that go beyond what has been traditionally thought of as data warehous- ing. These include operational applications to monitor and provide snapshot reporting on key business processes, sales and service applications to facilitate direct support for customer-facing employees, and self-service Web portal applications to directly support customers and suppliers. These applications demand short, consistent response times. The addi- tion of simple and effective database tuning capabilities to the massive throughput power of parallel process- ing provides tremendous performance and efficiency gains and expands the types of applications, business processes and number of users that the data warehouse can support. The ultimate results for businesses are improved competitiveness in the mar- ket and increased return on investment (ROI) for the data warehouse. Perfect pitch Parallel processing and data warehousing work well to- gether. When implemented on suitable hardware architecture, a parallel approach can be an order of magnitude faster than a serial approach. To illustrate this concept, consider that analyzing 100 million transactions is quicker if 100 tasks each analyze 1 million transactions in paral- lel than if one task analyzes all 100 million. This tremendous capacity for analytical throughput empowered by parallel processing can minimize the need for database tuning, or at least change the focus of tuning tech- niques for many queries in a data warehouse workload. However, large performance and efficiency gains can be achieved with a few select tuning techniques. Parallel processing enables flexible analysis of the transactions within a reasonable re- sponse time. But what if many business que- ries always ask for financial totals by month? Does it make sense to compute those subto- tals for every query as each is run? Or what if a service agent in the inbound call center needs to research a particular customer’s transactions while on the phone with the individual? Can parallel processing alone provide the level of performance needed? And what might happen as more history is retained or the business expands and the transaction count jumps to 200 million? Will doubling the runtimes be acceptable? Predictable scalability is also enabled through parallel processing. To continue with the example above, increasing the hardware resources to support 200 parallel process tasks can double the throughput and cut the runtime in half. This is a great Turn up the volume Parallel processing combined with database tuning can optimize your data warehouse. by Keith Rimmereid Photography by Inmagine PAGE 1 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821

Transcript of Turn up the volume - Teradataapps.teradata.com/tdmo/v08n04/pdf/AR5821.pdf · database tuning...

WHY TERADATA

The widespread adoption of parallel

processing has been a boon for data

warehousing. It has allowed flexible analysis

of increasingly more granular data so busi-

nesses can optimize the organization’s strat-

egy and enhance its operational execution.

The demand on a data warehouse will

continue to expand and evolve, requiring

support for a diverse body of analytics and

applications that go beyond what has been

traditionally thought of as data warehous-

ing. These include operational applications

to monitor and provide snapshot reporting

on key business processes, sales and service

applications to facilitate direct support for

customer-facing employees, and self-service

Web portal applications to

directly support customers

and suppliers.

These applications

demand short, consistent

response times. The addi-

tion of simple and effective

database tuning capabilities

to the massive throughput

power of parallel process-

ing provides tremendous

performance and efficiency

gains and expands the types

of applications, business

processes and number

of users that the data

warehouse can support.

The ultimate results for

businesses are improved

competitiveness in the mar-

ket and increased return on

investment (ROI) for the

data warehouse.

Perfect pitchParallel processing and data

warehousing work well to-

gether. When implemented

on suitable hardware architecture, a parallel

approach can be an order of magnitude

faster than a serial approach. To illustrate

this concept, consider that analyzing 100

million transactions is quicker if 100 tasks

each analyze 1 million transactions in paral-

lel than if one task analyzes all 100 million.

This tremendous capacity for analytical

throughput empowered by parallel processing

can minimize the need for database tuning,

or at least change the focus of tuning tech-

niques for many queries in a data warehouse

workload. However, large performance and

efficiency gains can be achieved with a few

select tuning techniques.

Parallel processing enables flexible analysis

of the transactions within a reasonable re-

sponse time. But what if many business que-

ries always ask for financial totals by month?

Does it make sense to compute those subto-

tals for every query as each is run? Or what

if a service agent in the inbound call center

needs to research a particular customer’s

transactions while on the phone with the

individual? Can parallel processing alone

provide the level of performance needed?

And what might happen as more history

is retained or the business expands and

the transaction count jumps to 200 million?

Will doubling the runtimes be acceptable?

Predictable scalability is also enabled

through parallel processing. To continue

with the example above, increasing the

hardware resources to support 200 parallel

process tasks can double the throughput

and cut the runtime in half. This is a great

Turn up the volumeParallel processing combined with database tuning can optimize your data warehouse. by Keith Rimmereid

Photo

grap

hy by Inm

agine

PAGE 1 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821

attribute, but it comes with a significant

price tag for the additional hardware.

And it would not meet the response time

requirements of the customer service agent.

Nor is it a solution if the number of queries

scanning this table dramatically increases.

Four techniquesThe alternative to expanding the hardware

platform is to apply tuning techniques that

reduce throughput requirements and increase

the performance of the parallel system. The

following techniques increase the efficiency

of the parallel system by reducing the amount

of data that needs to be processed for many

queries. System and/or people resources are

necessary to gain the full benefit of each tun-

ing technique. Also, workload management

facilities can be used to prioritize system

resources to the appropriate queries in order

to better meet user expectations. (See table.)

1. TAblE PARTiTioninGA table partition is a small portion or a subset

of a table. Parallel databases physically parti-

tion tables across the hardware

CPU and storage components to

support parallel processing and

increase throughput by allowing

the partitions to be processed

concurrently, greatly increasing

data scan throughput. This

is often referred to as horizon-

tal partitioning.

Additional levels of parti-

tioning within the horizontal

partitions can often be defined

to reduce the amount of data

that needs to be processed. This

will reduce query runtime. As an

example, a sales transaction table

contains a row for each sales

transaction that occurred over

the last 24 months. The table in-

cludes the item sold, its price and

quantity, the sales date, the sales

region, customer information

and other relevant data. If two

additional levels of partitioning are defined,

the first could be for the month when the

transaction occurred and the second could

be for the sales region to which the customer

is assigned. This additional partitioning

greatly reduces the amount of data required

to be processed for queries seeking informa-

tion for a given month or two.

When comparing a particular month’s sales

results with the same month’s results from

the previous year, partitioning by month will

increase the efficiency of the query by about

12 times—the database will need

to process transactions for only two

months out of 24. For a query scop-

ing the transactions for a sales re-

gion in a given month, the efficiency

gain is even greater, reaching a 48-

fold reduction if four sales districts

are defined. (See figure, left.)

2. SEConDARy inDExESSecondary indexes are additional

database-maintained structures

that offer greater data selectiv-

ity than the partitioning strategy.

These indexes are useful in sup-

porting response-time-sensitive

queries. For the sales transaction

table described, a secondary index

could be defined for Customer.

Then, if a service agent needs to

review a customer’s transactions,

the request to the database goes

directly to the Customer index and

Database partitioning improves efficiency

Multiple levels of partitioning dramatically reduce the required data to be scanned, thus shortening analysis runtime.

Selectivescanning

Very selectivescanning

Technique Pro Con

Table partitioningReduced runtime for users via reduced system resource consumption

Effort to define, monitor and maintain

Secondary indexesExcellent for supporting quick response time for simple queries

Effort to define and maintain; space consumption; potential reduction of data update throughput

Database-maintained summary tables

Quick response time for simple queries and reduced runtimes for repetitive reporting of key metrics

Effort to define and maintain; space consumption; potential reduction of data update throughput

Workload management and scheduling

More consistent end-user experience

Effort to define, monitor and maintain

Table: pros and cons of parallel processing database techniques

PAGE 2 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821

accesses only the sales transactions for that

customer. Broadly employing parallelism

in place of a selective index will fail to meet

the response time expectations and limit the

ability to support high levels of concurrency.

Secondary indexes not only provide very

quick response times, but also enable high

levels of concurrency, because the resource

requirement for each query is small.

3. DATAbASE-mAinTAinED SummARy TAblESMany business queries have two common

requirements: summary totals of usual

business metrics (such as total sales by

region) and joining the same tables (such

as customer with transactions). When com-

mon database-maintained summaries and

joins have highly repetitive patterns, they

provide performance and efficiency gains.

Once defined, the database builds and

automatically maintains a structure that

contains the specified columns from the

joined tables and the counts and totals that

comprise the business metric. Individual

queries are then satisfied from the database-

maintained summary table, thus eliminat-

ing repetitive joining and aggregation.

4. WoRkloAD mAnAGEmEnT AnD SChEDulinGWorkload management may not be thought

of as a tuning facility, but its goals are the

same: to meet user expectations of runtimes

and ensure efficient system utilization.

Queries are submitted to the data warehouse

in bursts and irregular patterns, which create

many resource peaks and valleys. The mix of

queries will include simple lookups, repetitive

reporting and resource-hungry data mining

processes. The roles of workload management

facilities are to smooth the utilization peaks

and valleys and to ensure the system resources

are assigned to queries based on appropriate

priorities so response time requirements of

the user community are sufficiently met.

Dynamic composition As a data warehouse evolves and grows,

the workload composition typically

changes. Ad hoc exploration and experi-

mentation of a particular subject leads

to standard repetitive report suites and

dashboards. Each new subject area

deployed drives the delivery of additional

repetitive analysis.

The repetitive and predictable character-

istics of the workload offer new opportuni-

ties to take advantage of tuning features

while efficiently supporting the continued

ad hoc exploration. It is a natural evolution

to begin with minimal tuning and then, as

it makes sense, to selectively use the tuning

techniques that provide the best balance of

effort and reward.

Intelligently using a good balance of

various database tuning techniques—

parallelism, partitioning and indexing—

will enable the data warehouse to support

a broad range of analytic and operational

applications at the lowest possible cost. T

Keith Rimmereid is a senior consultant with

Teradata Data Warehousing Sales Support in

San Diego.

WHY TERADATA

> Fine-grained parallelism is always on and automatically

activated, and it does not need to be configured,

tuned or managed.

> Cost-based optimizer is sophisticated and responds on

its own to changes in the physical database design to

create the best-performing query execution plans.

> Multi-level partitioning is easy to implement, greatly

improves performance and minimizes resource con-

sumption by reducing the amount of data that needs

to be considered for a wide variety of queries.

> Indexing options are easy to set up and

maintenance-free, and they are automatically

used to improve query performance.

> Join indexes and aggregate join indexes eliminate the

need for repetitive on-the-fly joins and aggrega-

tions, speeding query performance and reducing

resource consumption.

> Teradata Active System Management dynamically allo-

cates system resources according to priority rules, and

it smooths the peaks and valleys of resource demand.

> Teradata Professional Services Performance Center of

Excellence provides customized performance manage-

ment assessments along with best-practices training

to enable organizations to maintain optimum data-

base and application execution.

—K.R.

Why the Teradata system for an easily tuned environment?

A s the only popular data warehouse database built on a natively parallel architecture, the Teradata system brings great

balance between the throughput needed for massive complex queries and the selectivity needed for response-time-

sensitive queries. Straightforward tuning options focus the massive parallelism on narrow slices of the data or selectively

use units of parallelism for single- or few-row lookups:

PAGE 3 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821