Turn up the volume - Teradataapps.teradata.com/tdmo/v08n04/pdf/AR5821.pdf · database tuning...
Transcript of Turn up the volume - Teradataapps.teradata.com/tdmo/v08n04/pdf/AR5821.pdf · database tuning...
WHY TERADATA
The widespread adoption of parallel
processing has been a boon for data
warehousing. It has allowed flexible analysis
of increasingly more granular data so busi-
nesses can optimize the organization’s strat-
egy and enhance its operational execution.
The demand on a data warehouse will
continue to expand and evolve, requiring
support for a diverse body of analytics and
applications that go beyond what has been
traditionally thought of as data warehous-
ing. These include operational applications
to monitor and provide snapshot reporting
on key business processes, sales and service
applications to facilitate direct support for
customer-facing employees, and self-service
Web portal applications to
directly support customers
and suppliers.
These applications
demand short, consistent
response times. The addi-
tion of simple and effective
database tuning capabilities
to the massive throughput
power of parallel process-
ing provides tremendous
performance and efficiency
gains and expands the types
of applications, business
processes and number
of users that the data
warehouse can support.
The ultimate results for
businesses are improved
competitiveness in the mar-
ket and increased return on
investment (ROI) for the
data warehouse.
Perfect pitchParallel processing and data
warehousing work well to-
gether. When implemented
on suitable hardware architecture, a parallel
approach can be an order of magnitude
faster than a serial approach. To illustrate
this concept, consider that analyzing 100
million transactions is quicker if 100 tasks
each analyze 1 million transactions in paral-
lel than if one task analyzes all 100 million.
This tremendous capacity for analytical
throughput empowered by parallel processing
can minimize the need for database tuning,
or at least change the focus of tuning tech-
niques for many queries in a data warehouse
workload. However, large performance and
efficiency gains can be achieved with a few
select tuning techniques.
Parallel processing enables flexible analysis
of the transactions within a reasonable re-
sponse time. But what if many business que-
ries always ask for financial totals by month?
Does it make sense to compute those subto-
tals for every query as each is run? Or what
if a service agent in the inbound call center
needs to research a particular customer’s
transactions while on the phone with the
individual? Can parallel processing alone
provide the level of performance needed?
And what might happen as more history
is retained or the business expands and
the transaction count jumps to 200 million?
Will doubling the runtimes be acceptable?
Predictable scalability is also enabled
through parallel processing. To continue
with the example above, increasing the
hardware resources to support 200 parallel
process tasks can double the throughput
and cut the runtime in half. This is a great
Turn up the volumeParallel processing combined with database tuning can optimize your data warehouse. by Keith Rimmereid
Photo
grap
hy by Inm
agine
PAGE 1 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821
attribute, but it comes with a significant
price tag for the additional hardware.
And it would not meet the response time
requirements of the customer service agent.
Nor is it a solution if the number of queries
scanning this table dramatically increases.
Four techniquesThe alternative to expanding the hardware
platform is to apply tuning techniques that
reduce throughput requirements and increase
the performance of the parallel system. The
following techniques increase the efficiency
of the parallel system by reducing the amount
of data that needs to be processed for many
queries. System and/or people resources are
necessary to gain the full benefit of each tun-
ing technique. Also, workload management
facilities can be used to prioritize system
resources to the appropriate queries in order
to better meet user expectations. (See table.)
1. TAblE PARTiTioninGA table partition is a small portion or a subset
of a table. Parallel databases physically parti-
tion tables across the hardware
CPU and storage components to
support parallel processing and
increase throughput by allowing
the partitions to be processed
concurrently, greatly increasing
data scan throughput. This
is often referred to as horizon-
tal partitioning.
Additional levels of parti-
tioning within the horizontal
partitions can often be defined
to reduce the amount of data
that needs to be processed. This
will reduce query runtime. As an
example, a sales transaction table
contains a row for each sales
transaction that occurred over
the last 24 months. The table in-
cludes the item sold, its price and
quantity, the sales date, the sales
region, customer information
and other relevant data. If two
additional levels of partitioning are defined,
the first could be for the month when the
transaction occurred and the second could
be for the sales region to which the customer
is assigned. This additional partitioning
greatly reduces the amount of data required
to be processed for queries seeking informa-
tion for a given month or two.
When comparing a particular month’s sales
results with the same month’s results from
the previous year, partitioning by month will
increase the efficiency of the query by about
12 times—the database will need
to process transactions for only two
months out of 24. For a query scop-
ing the transactions for a sales re-
gion in a given month, the efficiency
gain is even greater, reaching a 48-
fold reduction if four sales districts
are defined. (See figure, left.)
2. SEConDARy inDExESSecondary indexes are additional
database-maintained structures
that offer greater data selectiv-
ity than the partitioning strategy.
These indexes are useful in sup-
porting response-time-sensitive
queries. For the sales transaction
table described, a secondary index
could be defined for Customer.
Then, if a service agent needs to
review a customer’s transactions,
the request to the database goes
directly to the Customer index and
Database partitioning improves efficiency
Multiple levels of partitioning dramatically reduce the required data to be scanned, thus shortening analysis runtime.
Selectivescanning
Very selectivescanning
Technique Pro Con
Table partitioningReduced runtime for users via reduced system resource consumption
Effort to define, monitor and maintain
Secondary indexesExcellent for supporting quick response time for simple queries
Effort to define and maintain; space consumption; potential reduction of data update throughput
Database-maintained summary tables
Quick response time for simple queries and reduced runtimes for repetitive reporting of key metrics
Effort to define and maintain; space consumption; potential reduction of data update throughput
Workload management and scheduling
More consistent end-user experience
Effort to define, monitor and maintain
Table: pros and cons of parallel processing database techniques
PAGE 2 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821
accesses only the sales transactions for that
customer. Broadly employing parallelism
in place of a selective index will fail to meet
the response time expectations and limit the
ability to support high levels of concurrency.
Secondary indexes not only provide very
quick response times, but also enable high
levels of concurrency, because the resource
requirement for each query is small.
3. DATAbASE-mAinTAinED SummARy TAblESMany business queries have two common
requirements: summary totals of usual
business metrics (such as total sales by
region) and joining the same tables (such
as customer with transactions). When com-
mon database-maintained summaries and
joins have highly repetitive patterns, they
provide performance and efficiency gains.
Once defined, the database builds and
automatically maintains a structure that
contains the specified columns from the
joined tables and the counts and totals that
comprise the business metric. Individual
queries are then satisfied from the database-
maintained summary table, thus eliminat-
ing repetitive joining and aggregation.
4. WoRkloAD mAnAGEmEnT AnD SChEDulinGWorkload management may not be thought
of as a tuning facility, but its goals are the
same: to meet user expectations of runtimes
and ensure efficient system utilization.
Queries are submitted to the data warehouse
in bursts and irregular patterns, which create
many resource peaks and valleys. The mix of
queries will include simple lookups, repetitive
reporting and resource-hungry data mining
processes. The roles of workload management
facilities are to smooth the utilization peaks
and valleys and to ensure the system resources
are assigned to queries based on appropriate
priorities so response time requirements of
the user community are sufficiently met.
Dynamic composition As a data warehouse evolves and grows,
the workload composition typically
changes. Ad hoc exploration and experi-
mentation of a particular subject leads
to standard repetitive report suites and
dashboards. Each new subject area
deployed drives the delivery of additional
repetitive analysis.
The repetitive and predictable character-
istics of the workload offer new opportuni-
ties to take advantage of tuning features
while efficiently supporting the continued
ad hoc exploration. It is a natural evolution
to begin with minimal tuning and then, as
it makes sense, to selectively use the tuning
techniques that provide the best balance of
effort and reward.
Intelligently using a good balance of
various database tuning techniques—
parallelism, partitioning and indexing—
will enable the data warehouse to support
a broad range of analytic and operational
applications at the lowest possible cost. T
Keith Rimmereid is a senior consultant with
Teradata Data Warehousing Sales Support in
San Diego.
WHY TERADATA
> Fine-grained parallelism is always on and automatically
activated, and it does not need to be configured,
tuned or managed.
> Cost-based optimizer is sophisticated and responds on
its own to changes in the physical database design to
create the best-performing query execution plans.
> Multi-level partitioning is easy to implement, greatly
improves performance and minimizes resource con-
sumption by reducing the amount of data that needs
to be considered for a wide variety of queries.
> Indexing options are easy to set up and
maintenance-free, and they are automatically
used to improve query performance.
> Join indexes and aggregate join indexes eliminate the
need for repetitive on-the-fly joins and aggrega-
tions, speeding query performance and reducing
resource consumption.
> Teradata Active System Management dynamically allo-
cates system resources according to priority rules, and
it smooths the peaks and valleys of resource demand.
> Teradata Professional Services Performance Center of
Excellence provides customized performance manage-
ment assessments along with best-practices training
to enable organizations to maintain optimum data-
base and application execution.
—K.R.
Why the Teradata system for an easily tuned environment?
A s the only popular data warehouse database built on a natively parallel architecture, the Teradata system brings great
balance between the throughput needed for massive complex queries and the selectivity needed for response-time-
sensitive queries. Straightforward tuning options focus the massive parallelism on narrow slices of the data or selectively
use units of parallelism for single- or few-row lookups:
PAGE 3 | Teradata Magazine | December 2008 | ©2008 Teradata Corporation | AR-5821